Options
Information retrieval of digitized medieval manuscripts
Auteur(s)
Editeur(s)
Date de parution
2013
Résumé
This dissertation investigates the retrieval of noisy texts in general and digitized historical manuscripts in particular. The noise originates from several sources, these include imperfect text recognition (6% word error rate), spelling variation, non-standardized grammar, in addition to user-side confusion due to her/his limited knowledge of the underlying language and/or the searched text. Manual correction or normalization are very time-consuming and resource-demanding tasks and are thus out of the question. Furthermore, external resources, such as thesauri, are not available for the older, lesser-known languages. In this dissertation, we present our contributions to overcoming or at least coping with these issues. We developed several methods that provide a low-cost yet highly-effective text representation to limit the negative impact of recognition error and the variable orthography and morphology. Finally, to account for the user-confusion problem, we developed a low-cost query enrichment function which we deem indispensable for the challenging task of one-word queries.
Notes
Keywords: information retrieval of noisy texts, information retrieval of handwritten documents, OCR, Text recognition, digital libraries, Middle High German, medieval manuscripts Thèse de doctorat : Université de Neuchâtel, 2013
Identifiants
Type de publication
doctoral thesis
Dossier(s) à télécharger