Repository logo
Research Data
Publications
Projects
Persons
Organizations
English
Français
Log In(current)
  1. Home
  2. Publications
  3. Article de recherche (journal article)
  4. Searching strategies for the Bulgarian language

Searching strategies for the Bulgarian language

Author(s)
Savoy, Jacques  
Institut d'informatique  
Date issued
2007
In
Information Retrieval, Springer, 2007/10/6/509-529
Subjects
Cross-language information retrieval Bulgarian IR Stemmer Evaluation Morphology
Abstract
This paper reports on the underlying IR problems encountered when indexing and searching with the Bulgarian language. For this language we propose a general light stemmer and demonstrate that it can be quite effective, producing significantly better MAP (around + 34%) than an approach not applying stemming. We implement the GL2 model derived from the <i>Divergence from Randomness</i> paradigm and find its retrieval effectiveness better than other probabilistic, vector-space and language models. The resulting MAP is found to be about 50% better than the classical <i>tf idf</i> approach. Moreover, increasing the query size enhances the MAP by around 10% (from T to TD). In order to compare the retrieval effectiveness of our suggested stopword list and the light stemmer developed for the Bulgarian language, we conduct a set of experiments on another stopword list and also a more complex and aggressive stemmer. Results tend to indicate that there is no statistically significant difference between these variants and our suggested approach. This paper evaluates other indexing strategies such as 4-gram indexing and indexing based on the automatic decompounding of compound words. Finally, we analyze certain queries to discover why we obtained poor results, when indexing Bulgarian documents using the suggested word-based approach.
Publication type
journal article
Identifiers
https://libra.unine.ch/handle/20.500.14713/60342
DOI
10.1007/s10791-007-9033-9
File(s)
Loading...
Thumbnail Image
Download
Name

Savoy_Jacques_-_Searching_strategies_for_the_Bulgarian_language_20091208.pdf

Type

Main Article

Size

1.26 MB

Format

Adobe PDF

Université de Neuchâtel logo

Service information scientifique & bibliothèques

Rue Emile-Argand 11

2000 Neuchâtel

contact.libra@unine.ch

Service informatique et télématique

Rue Emile-Argand 11

Bâtiment B, rez-de-chaussée

Powered by DSpace-CRIS

libra v2.2.0

© 2026 Université de Neuchâtel

Portal overviewUser guideOpen Access strategyOpen Access directive Research at UniNE Open Access ORCIDWhat's new