Voici les éléments 1 - 10 sur 27
Vignette d'image
Publication
Accès libre

Multi-layered semantic annotation and the formalisation of annotation schemas for the investigation of modality in a Latin corpus

2024, Bermúdez Sabel, Helena, Dell'Oro, Francesca, Marongiu, Paola

This paper stems from the project A World of Possibilities. Modal pathways over an extra-long period of time: the diachrony of modality in the Latin language (WoPoss) which involves a corpus-based approach to the study of modality in the history of the Latin language. Linguistic annotation and, in particular, the semantic annotation of modality is a keystone of the project. Besides the difficulties intrinsic to any annotation task dealing with semantics, our annotation scheme involves multiple layers of annotation that are interconnected, adding complexity to the task. Considering the intricacies of our fine-grained semantic annotation, we needed to develop well-documented schemas in order to control the consistency of the annotation, but also to enable an efficient reuse of our annotated corpus. This paper presents the different elements involved in the annotation task, and how the description and the relations between the different linguistic components were formalised and documented, combining schema languages with XML documentation.

Vignette d'image
Publication
Accès libre

Towards a common model for European Poetry: Challenges and solutions

2022, Bermúdez Sabel, Helena, Díez-Platas, M.L., Ros, Salvador, González-Blanco, Elena

This paper stems from the analysis of multiple poetic resources that were available online, as well as the results of methodological discussions with scholars of European Literature. The goal was to retrieve the informational needs of all these different sources in order to build a common data model for European Poetry (EP). Thus, by implementing a reverse engineering method, we have created the Domain Model for EP, which is an important breakthrough for making existent poetry resources interoperable. The lack of a uniform academic approach to analyse and classify poetic manifestations, the divergence of theories when comparing poetry schools from different languages and periods is some of the factors that hinder the modelling process. In this paper, we will present some of the challenges we encountered while conceptualizing the information relevant to poetic analysis and how we have worked around them. Some elements of the ontology will be presented to illustrate our modelling strategies.

Vignette d'image
Publication
Accès libre

TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus

2021, Álvarez-Mellado, Elena, Díez-Platas, M.L., Ruiz Fabo, Pablo, Bermúdez Sabel, Helena, Ros, Salvador, González-Blanco, Elena

Medieval documents are a rich source of historical data. Performing named-entity recognition (NER) on this genre of texts can provide us with valuable historical evidence. However, traditional NER categories and schemes are usually designed with modern documents in mind (i.e. journalistic text) and the general-domain NER annotation schemes fail to capture the nature of medieval entities. In this paper we explore the challenges of performing named-entity annotation on a corpus of Spanish medieval documents: we discuss the mismatches that arise when applying traditional NER categories to a corpus of Spanish medieval documents and we propose a novel humanist-friendly TEI-compliant annotation scheme and guidelines intended to capture the particular nature of medieval entities.

Vignette d'image
Publication
Accès libre

Implemented to Be Shared: the WoPoss Annotation of Semantic Modality in a Latin Diachronic Corpus

2020, Dell'Oro, Francesca, Bermúdez Sabel, Helena, Marongiu, Paola

The FNS project A world of possibilities (WoPoss) studies the evolution of modal meanings in the Latin language. Passages expressing modal notions such as ‘possibility and ‘necessity’ are annotated following a pipeline that combines both automatic and manual annotation. This paper discusses the creation, annotation and processing of the WoPoss corpus. Texts are first gathered from different online open access resources to create the initial dataset. Due to the heterogeneity of formats and encodings, these texts are regularized before the application of an automatic linguistic annotation. The annotated files are then uploaded to the annotation platform INCEpTION. Through this platform, annotators add the relevant linguistic and semantic information following the WoPoss guidelines. The results of the automatic annotation are also curated. The fine-grained semantic annotation is the core activity of the WoPoss workflow, thus this paper focuses on the preparation of files and how the semantic annotation task is tackled.

Vignette d'image
Publication
Accès libre

A new corpus annotation framework for Latin diachronic lexical semantics

2022-7-16, McGillivray, Barbara, Kondakova,, Daria, Burman, Annie, Dell'Oro, Francesca, Bermúdez Sabel, Helena, Marongiu, Paola, Márquez Cruz, Manuel

We present a new corpus-based resource and methodology for the annotation of Latin lexical semantics, consisting of 2,399 annotated passages of 40 lemmas from the Latin diachronic corpus LatinISE. We also describe how the annotation was designed, analyse annotators’ styles, and present the preliminary results of a study on the lexical semantics and diachronic change of the 40 lemmas. We complement this analysis with a case study on semantic vagueness. As the availability of digital corpora of ancient languages increases, and as computational research develops new methods for large-scale analysis of diachronic lexical semantics, building lexical semantic annotation resources can shed new light on large-scale patterns in the semantic development of lexical items over time. We share recommendations for designing the annotation task that will hopefully help similar research on other less-resourced or historical languages.

Vignette d'image
Publication
Accès libre

Setting Up Bilingual Comparable Corpora with Non-Contemporary Languages

2022, Bermúdez Sabel, Helena, Dell'Oro, Francesca, Montrichard, Cyrielle, Rossari, Corinne

This paper presents the project “Les corpora latins et français: une fabrique pour l’accès à la représentation des connaissances” (Latin and French Corpora: a Factory For Accessing Knowledge Representation) whose focus is the study of modality in both Latin and French by means of multi-genre, diachronic comparable corpora. The setting up of such corpora involves a number of conceptualisation challenges, in particular with regard to how to compare two asynchronous textual productions corresponding to different cultural frameworks. In this paper we outline the rationale of designing comparable corpora to explore our research questions and then focus on some of the issues that arise when comparing different diachronic spans of Latin and French. We also explain how these issues were dealt with, thus providing some grounds upon which other projects could build their methodology.

Vignette d'image
Publication
Accès libre

Trobadores de corte en corte: Visualización dos centros culturais ibéricos tradomedievais

2021, Bermúdez Sabel, Helena

The dynamics of political negotiations, social conflicts, and relations of power in society are inseparable from the culture it produces. The complexity of the Galician-Portuguese lyric provides a particularly illuminating opportunity to examine how the relations of cultural centers change through the Late Middle Ages by studying the biographic profiles of the creators of texts and their relations of patronage. The following is a proposal to explore from an educational perspective the sociocultural context in which the cultural movement of the troubadours was born and developed.

Vignette d'image
Publication
Accès libre

Pygmalion in the classroom: a tool to draw lexicographic diachronic maps and their application to didactics

2022, Dell'Oro, Francesca, Bermúdez Sabel, Helena, Marongiu, Paola

This contribution presents Pygmalion, a tool that facilitates the creation of interactive diachronic maps, and focuses on some of its possible applications to the didactics of languages and linguistics. Pygmalion was conceived in the framework of the project A world of possibilities. Modal pathways over an extra-long period of time: the diachrony of modality in the Latin language (WoPoss). Although its initial conceptualisation was heavily influenced by the research questions of this project and, therefore, the visualisation of modality was a decisive feature, the tool was later redesigned for a broader use. In fact, to increase usability, we offer three different versions to better suit users’ requirements. The primary goal of Pygmalion is to provide scholars, teachers, and learners with an instrument to visually represent the heterogenous diachronic linguistic information contained in lexicographic works. The conceptualisation of this type of resource raises a twofold objective: while we need to address the difficulties of designing a visualisation that illustrates complex concepts, such as semantic shifts and meaning relations, it is crucial to ensure the readability of the data through a user-friendly and intuitive tool.

Vignette d'image
Publication
Accès libre

L’édition numérique au service de la philologie matérielle. Modèles de la lyrique galégo-portugaise

2022, Bermúdez Sabel, Helena

Vignette d'image
Publication
Accès libre

Reviewing the bread and butter of CoReMa, Cooking Recipes of the Middle Ages

2021, Bermúdez Sabel, Helena

CoReMa, Cooking Recipes of the Middle Ages is an ongoing project whose method involves a semantically annotated, conservative edition of medieval manuscripts containing cooking recipes. With a rigorous philological approach and the aid of semantic web technologies, CoReMa’s method aims at teasing out the textual relations between different recipe collections, even enabling the comparison between manuscripts in different languages. Its semantic model covers many and very heterogeneous aspects of the transmission of cooking knowledge including, just to give an example, the treatments of a condition or illness. CoReMa is an ambitious project and food historians will not be the only scholars that will benefit from such a comprehensive (and intuitive) resource.