Logo du site
  • English
  • Français
  • Se connecter
Logo du site
  • English
  • Français
  • Se connecter
  1. Accueil
  2. Université de Neuchâtel
  3. Publications
  4. Indexing and stemming approaches for the Czech language
 
  • Details
Options
Vignette d'image

Indexing and stemming approaches for the Czech language

Auteur(s)
Dolamic, Ljiljana
Savoy, Jacques 
Institut d'informatique 
In
Information Processing and Management, 2009/45/6/714-720
Mots-clés
  • Czech language
  • Stemming
  • Evaluation
  • Slavic languages
  • Czech language

  • Stemming

  • Evaluation

  • Slavic languages

Résumé
This paper describes and evaluates various stemming and indexing strategies for the Czech language. Based on Czech test-collection, we have designed and evaluated two stemming approaches, a light and a more aggressive one. We have compared them with a no stemming scheme as well as a language-independent approach (<i>n</i>-gram). To evaluate the suggested solutions we used various IR models, including Okapi, <i>Divergence from Randomness</i> (DFR), a statistical language model (LM) as well as the classical <i>tf idf</i> vector-space approach. We found that the <i>Divergence from Randomness</i> paradigm tend to propose better retrieval effectiveness than the Okapi, LM or <i>tf idf</i> models, the performance differences were however statistically significant only with the last two IR approaches. Ignoring the stemming reduces generally the MAP by more than 40%, and these differences are always significant. Finally, if our more aggressive stemmer tends to show the best performance, the differences in performance with a light stemmer are not statistically significant.
Identifiants
https://libra.unine.ch/handle/123456789/9572
_
10.1016/j.ipm.2009.06.001
Type de publication
journal article
Dossier(s) à télécharger
 main article: Dolamic_Ljiljana-Indexing_and_stemming_approaches_for_the_czech_language-20130108.pdf (623.32 KB)
google-scholar
Présentation du portailGuide d'utilisationStratégie Open AccessDirective Open Access La recherche à l'UniNE Open Access ORCIDNouveautés

Service information scientifique & bibliothèques
Rue Emile-Argand 11
2000 Neuchâtel
contact.libra@unine.ch

Propulsé par DSpace, DSpace-CRIS & 4Science | v2022.02.00