Multilingual and domain-specific IR: a case study in cultural heritage
Akasereh, Mitra
Date de parution
Nowadays we can find data collections in many different languages and in different fields. So we are facing with a rising need for search systems handling multilinguality as well as professional search systems which allow their users to search in a specific field of knowledge. <br> In this thesis we propose a search system for data on cultural heritage. Our data comes from different resources located in different countries and written in various languages. We study the specific structure, characteristics and terminology of data in this field in order to build an effective retrieval system. We evaluate different information retrieval models and indexing strategies on monolingual data to find the ones which are effective and compatible with the nature of our data the most. To deal with different languages we study each language separately and propose tools such as stemmers for each language and fusion operators to merge the results from different languages. To be able to cross the languages easily we study different translation methods. Moreover in order to enhance the search results we investigate different query expansion technics. <br> Based on our results we propose using models from DFR family for the English language and Okapi model for the French and Polish language along with a light stemmer. For crossing the language barrier we propose using a combination of translation methods. The Z-score operator is the best evaluated one when merging different results from different languages in our multilingual tests. Finally we propose applying query expansion using an external source to improve the search performance.
Keywords: domain-specific IR, cultural heritage (CH), query expansion, pseudo-relevance feedback, data fusion, bilingual IR, multilingual IR Thèse de doctorat : Université de Neuchâtel, 2015
Type de publication
doctoral thesis
Dossier(s) à télécharger