Options
Naji, Nada
Nom
Naji, Nada
Affiliation principale
Identifiants
Résultat de la recherche
Voici les éléments 1 - 1 sur 1
- PublicationAccès libreInformation retrieval of digitized medieval manuscripts(2013)
; This dissertation investigates the retrieval of noisy texts in general and digitized historical manuscripts in particular. The noise originates from several sources, these include imperfect text recognition (6% word error rate), spelling variation, non-standardized grammar, in addition to user-side confusion due to her/his limited knowledge of the underlying language and/or the searched text. Manual correction or normalization are very time-consuming and resource-demanding tasks and are thus out of the question. Furthermore, external resources, such as thesauri, are not available for the older, lesser-known languages. In this dissertation, we present our contributions to overcoming or at least coping with these issues. We developed several methods that provide a low-cost yet highly-effective text representation to limit the negative impact of recognition error and the variable orthography and morphology. Finally, to account for the user-confusion problem, we developed a low-cost query enrichment function which we deem indispensable for the challenging task of one-word queries.