Logo du site
  • English
  • Français
  • Se connecter
Logo du site
  • English
  • Français
  • Se connecter
  1. Accueil
  2. Université de Neuchâtel
  3. Publications
  4. Searching strategies for the Bulgarian language
 
  • Details
Options
Vignette d'image

Searching strategies for the Bulgarian language

Auteur(s)
Savoy, Jacques 
Institut d'informatique 
Date de parution
2007
In
Information Retrieval
Vol.
6
No
10
De la page
509
A la page
529
Mots-clés
  • cross-language inform...

  • Bulgarian IR

  • stemmer

  • evaluation

  • morphology

  • TEXT RETRIEVAL

  • INFORMATION-RETRIEVAL...

  • STEMMING ALGORITHM

  • MODELS

  • WORDS

Résumé
This paper reports on the underlying IR problems encountered when indexing and searching with the Bulgarian language. For this language we propose a general light stemmer and demonstrate that it can be quite effective, producing significantly better MAP (around + 34%) than an approach not applying stemming. We implement the GL2 model derived from the Divergence from Randomness paradigm and find its retrieval effectiveness better than other probabilistic, vector-space and language models. The resulting MAP is found to be about 50% better than the classical tf idf approach. Moreover, increasing the query size enhances the MAP by around 10% (from T to TD). In order to compare the retrieval effectiveness of our suggested stopword list and the light stemmer developed for the Bulgarian language, we conduct a set of experiments on another stopword list and also a more complex and aggressive stemmer. Results tend to indicate that there is no statistically significant difference between these variants and our suggested approach. This paper evaluates other indexing strategies such as 4-gram indexing and indexing based on the automatic decompounding of compound words. Finally, we analyze certain queries to discover why we obtained poor results, when indexing Bulgarian documents using the suggested word-based approach.
URI
https://libra.unine.ch/handle/123456789/6465
Type de publication
Resource Types::text::journal::journal article
google-scholar
Présentation du portailGuide d'utilisationStratégie Open AccessDirective Open Access La recherche à l'UniNE Open Access ORCID

Adresse:
UniNE, Service information scientifique & bibliothèques
Rue Emile-Argand 11
2000 Neuchâtel

Construit avec Logiciel DSpace-CRIS Maintenu et optimiser par 4Sciences

  • Paramètres des témoins de connexion
  • Politique de protection de la vie privée
  • Licence de l'utilisateur final