Voici les éléments 1 - 6 sur 6
Pas de vignette d'image disponible
Publication
Métadonnées seulement

Searching strategies for the Hungarian language

2008, Savoy, Jacques

This paper reports on the underlying IR problems encountered when dealing with the complex morphology and compound constructions found in the Hungarian language. It describes evaluations carried out on two general stemming strategies for this language, and also demonstrates that a light stemming approach could be quite effective. Based on searches done on the CLEF test collection, we find that a more aggressive suffix-stripping approach may produce better MAP. When compared to an IR scheme without stemming or one based on only a light stemmer, we find the differences to be statistically significant. When compared with probabilistic, vector-space and language models, we find that the Okapi model results in the best retrieval effectiveness. The resulting MAP is found to be about 35% better than the classical tf Of approach, particularly for very short requests. Finally, we demonstrate that applying an automatic decompounding procedure for both queries and documents significantly improves IR performance (+10%), compared to word-based indexing strategies. (c) 2007 Elsevier Ltd. All rights reserved.

Pas de vignette d'image disponible
Publication
Métadonnées seulement

Cross-language information retrieval: experiments based on CLEF 2000 corpora

2003, Savoy, Jacques

Search engines play an essential role in the usability of Internet-based information systems and without them the Web would be much less accessible, and at the very least would develop at a much slower rate. Given that non-English users now tend to make up the majority in this environment, our main objective is to analyze and evaluate the retrieval effectiveness of various indexing and search strategies based on test-collections written in four different languages: English, French, German, and Italian. Our second objective is to describe and evaluate various approaches that might be implemented in order to effectively access document collections written in another language. As a third objective, we will explore the underlying problems involved in searching document collections written in the four different languages, and we will suggest and evaluate different database merging strategies capable of providing the user with a single unique result list. (C) 2002 Published by Elsevier Science Ltd.

Pas de vignette d'image disponible
Publication
Métadonnées seulement

Bibliographic database access using free-text and controlled vocabulary: an evaluation

2005, Savoy, Jacques

This paper evaluates and compares the retrieval effectiveness of various search models, based on either automatic text-word indexing or on manually assigned controlled descriptors. Retrieval is from a relatively large collection of bibliographic material written in French. Moreover, for this French collection we evaluate improvements that result from combining automatic and manual indexing. First, when considering various contexts, this study reveals that the combined indexing strategy always obtains the best retrieval performance. Second, when users wish to conduct exhaustive searches with minimal effort, we demonstrate that manually assigned terms are essential. Third, the evaluations presented in this paper study reveal the comparative retrieval performances that result from manual and automatic indexing in a variety of circumstances. (c) 2004 Elsevier Ltd. All rights reserved.

Pas de vignette d'image disponible
Publication
Métadonnées seulement

A stemming procedure and stopword list for general French corpora

1999, Savoy, Jacques

Due to the increasing use of network-based systems, there is a growing interest in access to and search mechanisms for text databases in languages other than English. To adapt searching systems to those foreign languages with characteristics similar to the English language, all we need to do for the most part is to establish a general stopword list and a stemming procedure. This article presents the tools needed to establish these in the French language databases and some retrieval experiments that have been carried out using two medium-sized French language test collections. These experiments were conducted to evaluate the retrieval effectiveness of the propositions described.

Pas de vignette d'image disponible
Publication
Métadonnées seulement

Combining multiple strategies for effective monolingual and cross-language retrieval

2004, Savoy, Jacques

This paper describes and evaluates different retrieval strategies that are useful for search operations on document collections written in various European languages, namely French, Italian, Spanish and German. We also suggest and evaluate different query translation schemes based on freely available translation resources. In order to cross language barriers, we propose a combined query translation approach that has resulted in interesting retrieval effectiveness. Finally, we suggest a collection merging strategy based on logistic regression that tends to perform better than other merging approaches.

Pas de vignette d'image disponible
Publication
Métadonnées seulement

Statistical inference in retrieval effectiveness evaluation

1997, Savoy, Jacques

Evaluation methodology, and particularly its statistical tests associated, plays a central role in the information retrieval domain which maintains a strong empirical tradition. In an effort to evaluate the retrieval effectiveness of a search algorithm, this paper focuses on the average precision over a set of fixed recall values. After reviewing traditional evaluation methodology through the use of examples, this study suggests applying another statistical inference methodology called bootstrap, within which no particular assumption is needed about the distribution of the observations. Moreover, this scheme may be used to assert the accuracy of virtually any statistic, to build approximate confidence interval, and to verify whether a statistically significant difference exists between two retrieval schemes, even when dealing with a relatively small sample size. This study also suggests selecting the sample median rather than the sample mean in evaluating retrieval effectiveness where the justification for this choice is based on the nature of the information retrieval data. (C) 1997 Elsevier Science Ltd.