Voici les éléments 1 - 4 sur 4
  • Publication
    Accès libre
    When stopword lists make the difference
    (2009)
    Dolamic, Ljiljana
    ;
    In this brief communication, we evaluate the use of two stopword lists for the English language (one comprising 571 words and another with 9) and compare them with a search approach accounting for all word forms. We show that through implementing the original Okapi form or certain ones derived from the Divergence from Randomness (DFR) paradigm, significantly lower performance levels may result when using short or no stopword lists. For other DFR models and a revised Okapi implementation, performance differences between approaches using short or long stopword lists or no list at all are usually not statistically significant. Similar conclusions can be drawn when using other natural languages such as French, Hindi, or Persian.
  • Publication
    Accès libre
    Indexation manuelle et automatique : une évaluation comparative basée sur un corpus en langue française
    Cette communication évalue et compare l'efficacité du dépistage de l'information utilisant une indexation automatique ou manuelle, cette dernière s'appuyant sur un vocabulaire contrôlé. Le corpus d'évaluation interrogé par dix modèles de dépistage de l'information comprend des notices bibliographiques écrites en français et couvrant diverses disciplines. Finalement, nous analysons la performance obtenue en combinant les deux formes d'indexation., This communication evaluates and compares the retrieval effectiveness of various search models, based on either automatic text-word indexing or on manually assigned controlled descriptors. These experiments were done with a relatively large collection of bibliographic material written in French. Moreover, for this French collection we evaluate improvements that result from combining automatic and manual indexing.
  • Publication
    Accès libre
    Information retrieval with Hindi, Bengali, and Marathi languages: evaluation and analysis
    ;
    Akasereh, Mitra
    ;
    Dolamic, Ljiljana
    Our first objective in participating in FIRE evaluation campaigns is to analyze the retrieval effectiveness of various indexing and search strategies when dealing with corpora written in Hindi, Bengali and Marathi languages. As a second goal, we have developed new and more aggressive stemming strategies for both Marathi and Hindi languages during this second campaign. We have compared their retrieval effectiveness with both light stemming strategy and n-gram language-independent approach. As another language-independent indexing strategy, we have evaluated the trunc-n method in which the indexing term is formed by considering only the first n letters of each word. To evaluate these solutions we have used various IR models including models derived from Divergence from Randomness (DFR), Language Model (LM) as well as Okapi, or the classical tf idf vector-processing approach.
    For the three studied languages, our experiments tend to show that IR models derived from Divergence from Randomness (DFR) paradigm tend to produce the best overall results. For these languages, our various experiments demonstrate also that either an aggressive stemming procedure or the trunc-n indexing approach produces better retrieval effectiveness when compared to other word-based or n-gram language-independent approaches. Applying the Z-score as data fusion operator after a blind-query expansion tends also to improve the MAP of the merged run over the best single IR system.
  • Publication
    Accès libre
    Ad hoc retrieval with Marathi language
    Akasereh, Mitra
    ;
    Our goal in participating in FIRE 2011 evaluation campaign is to analyse and evaluate the retrieval effectiveness of our implemented retrieval system when using Marathi language. We have developed a light and an aggressive stemmer for this language as well as a stopword list. In our experiment seven different IR models (language model, DFR-PL2, DFR-PB2, DFR-GL2, DFR-I(n e)C2, tf idf and Okapi) were used to evaluate the influence of these stemmers as well as n-grams and trunc-n language-independent indexing strategies, on retrieval performance. We also applied a pseudo relevance-feedback or blind-query expansion approach to estimate the impact of this approach on enhancing the retrieval effectiveness. Our results show that for Marathi language DFR-I(n e)C2, DFR-PL2 and Okapi IR models result the best performance. For this language trunc-n indexing strategy gives the best retrieval effectiveness comparing to other stemming and indexing approaches. Also the adopted pseudo-relevance feedback approach tends to enhance the retrieval effectiveness.