Voici les éléments 1 - 10 sur 24
  • Publication
    Accès libre
    A new corpus annotation framework for Latin diachronic lexical semantics
    (2022-7-16)
    McGillivray, Barbara
    ;
    Kondakova,, Daria
    ;
    Burman, Annie
    ;
    ; ; ;
    Márquez Cruz, Manuel
    We present a new corpus-based resource and methodology for the annotation of Latin lexical semantics, consisting of 2,399 annotated passages of 40 lemmas from the Latin diachronic corpus LatinISE. We also describe how the annotation was designed, analyse annotators’ styles, and present the preliminary results of a study on the lexical semantics and diachronic change of the 40 lemmas. We complement this analysis with a case study on semantic vagueness. As the availability of digital corpora of ancient languages increases, and as computational research develops new methods for large-scale analysis of diachronic lexical semantics, building lexical semantic annotation resources can shed new light on large-scale patterns in the semantic development of lexical items over time. We share recommendations for designing the annotation task that will hopefully help similar research on other less-resourced or historical languages.
  • Publication
    Accès libre
    Pygmalion in the classroom: a tool to draw lexicographic diachronic maps and their application to didactics
    (Madrid: Guillermo Escolar Editor, 2022) ; ;
    This contribution presents Pygmalion, a tool that facilitates the creation of interactive diachronic maps, and focuses on some of its possible applications to the didactics of languages and linguistics. Pygmalion was conceived in the framework of the project A world of possibilities. Modal pathways over an extra-long period of time: the diachrony of modality in the Latin language (WoPoss). Although its initial conceptualisation was heavily influenced by the research questions of this project and, therefore, the visualisation of modality was a decisive feature, the tool was later redesigned for a broader use. In fact, to increase usability, we offer three different versions to better suit users’ requirements. The primary goal of Pygmalion is to provide scholars, teachers, and learners with an instrument to visually represent the heterogenous diachronic linguistic information contained in lexicographic works. The conceptualisation of this type of resource raises a twofold objective: while we need to address the difficulties of designing a visualisation that illustrates complex concepts, such as semantic shifts and meaning relations, it is crucial to ensure the readability of the data through a user-friendly and intuitive tool.
  • Publication
    Accès libre
    Towards a common model for European Poetry: Challenges and solutions
    (2022) ;
    Díez-Platas, M.L.
    ;
    Ros, Salvador
    ;
    González-Blanco, Elena
    This paper stems from the analysis of multiple poetic resources that were available online, as well as the results of methodological discussions with scholars of European Literature. The goal was to retrieve the informational needs of all these different sources in order to build a common data model for European Poetry (EP). Thus, by implementing a reverse engineering method, we have created the Domain Model for EP, which is an important breakthrough for making existent poetry resources interoperable. The lack of a uniform academic approach to analyse and classify poetic manifestations, the divergence of theories when comparing poetry schools from different languages and periods is some of the factors that hinder the modelling process. In this paper, we will present some of the challenges we encountered while conceptualizing the information relevant to poetic analysis and how we have worked around them. Some elements of the ontology will be presented to illustrate our modelling strategies.
  • Publication
    Accès libre
    Setting Up Bilingual Comparable Corpora with Non-Contemporary Languages
    (Marseille, France: European Language Resources Association, 2022) ; ; ;
    This paper presents the project “Les corpora latins et français: une fabrique pour l’accès à la représentation des connaissances” (Latin and French Corpora: a Factory For Accessing Knowledge Representation) whose focus is the study of modality in both Latin and French by means of multi-genre, diachronic comparable corpora. The setting up of such corpora involves a number of conceptualisation challenges, in particular with regard to how to compare two asynchronous textual productions corresponding to different cultural frameworks. In this paper we outline the rationale of designing comparable corpora to explore our research questions and then focus on some of the issues that arise when comparing different diachronic spans of Latin and French. We also explain how these issues were dealt with, thus providing some grounds upon which other projects could build their methodology.
  • Publication
    Accès libre
    L’édition numérique au service de la philologie matérielle. Modèles de la lyrique galégo-portugaise
    (Santiago de Compostela: Centro Ramón Piñeiro para a Investigación en Humanidades, 2022)
  • Publication
    Accès libre
    TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus
    (2021)
    Álvarez-Mellado, Elena
    ;
    Díez-Platas, M.L.
    ;
    Ruiz Fabo, Pablo
    ;
    ;
    Ros, Salvador
    ;
    González-Blanco, Elena
    Medieval documents are a rich source of historical data. Performing named-entity recognition (NER) on this genre of texts can provide us with valuable historical evidence. However, traditional NER categories and schemes are usually designed with modern documents in mind (i.e. journalistic text) and the general-domain NER annotation schemes fail to capture the nature of medieval entities. In this paper we explore the challenges of performing named-entity annotation on a corpus of Spanish medieval documents: we discuss the mismatches that arise when applying traditional NER categories to a corpus of Spanish medieval documents and we propose a novel humanist-friendly TEI-compliant annotation scheme and guidelines intended to capture the particular nature of medieval entities.
  • Publication
    Accès libre
    Reviewing the bread and butter of CoReMa, Cooking Recipes of the Middle Ages
    CoReMa, Cooking Recipes of the Middle Ages is an ongoing project whose method involves a semantically annotated, conservative edition of medieval manuscripts containing cooking recipes. With a rigorous philological approach and the aid of semantic web technologies, CoReMa’s method aims at teasing out the textual relations between different recipe collections, even enabling the comparison between manuscripts in different languages. Its semantic model covers many and very heterogeneous aspects of the transmission of cooking knowledge including, just to give an example, the treatments of a condition or illness. CoReMa is an ambitious project and food historians will not be the only scholars that will benefit from such a comprehensive (and intuitive) resource.
  • Publication
    Accès libre
    Trobadores de corte en corte: Visualización dos centros culturais ibéricos tradomedievais
    (Santiago de Compostela: Universidade de Santiago de Compostela, Servizo de Publicacións,, 2021)
    The dynamics of political negotiations, social conflicts, and relations of power in society are inseparable from the culture it produces. The complexity of the Galician-Portuguese lyric provides a particularly illuminating opportunity to examine how the relations of cultural centers change through the Late Middle Ages by studying the biographic profiles of the creators of texts and their relations of patronage. The following is a proposal to explore from an educational perspective the sociocultural context in which the cultural movement of the troubadours was born and developed.
  • Publication
    Accès libre
    Implemented to Be Shared: the WoPoss Annotation of Semantic Modality in a Latin Diachronic Corpus
    The FNS project A world of possibilities (WoPoss) studies the evolution of modal meanings in the Latin language. Passages expressing modal notions such as ‘possibility and ‘necessity’ are annotated following a pipeline that combines both automatic and manual annotation. This paper discusses the creation, annotation and processing of the WoPoss corpus. Texts are first gathered from different online open access resources to create the initial dataset. Due to the heterogeneity of formats and encodings, these texts are regularized before the application of an automatic linguistic annotation. The annotated files are then uploaded to the annotation platform INCEpTION. Through this platform, annotators add the relevant linguistic and semantic information following the WoPoss guidelines. The results of the automatic annotation are also curated. The fine-grained semantic annotation is the core activity of the WoPoss workflow, thus this paper focuses on the preparation of files and how the semantic annotation task is tackled.
  • Publication
    Accès libre
    The Diachronic Spanish Sonnet Corpus: TEI and linked open data encoding, data distribution, and metrical findings
    (2020)
    Ruiz Fabo, Pablo
    ;
    ;
    Martínez Cantón, Clara
    ;
    González-Blanco, Elena
    How has the sonnet form in Spanish evolved over the centuries? What is the distribution of metrical patterns and combinations thereof, considering diachronic, geographical, and social factors? What rhyme schemes are favoured in different periods and regions? How is enjambment distributed within the sonnet? Providing quantitative answers to such questions requires a corpus spanning several centuries, annotated for the relevant literary features and containing author metadata. The absence of appropriate digital resources to undertake a macroanalytic study of the evolution of the sonnet in Spanish led us to create the Diachronic Spanish Sonnet Corpus. This article presents how the corpus was designed for providing quantitative evidence on the evolution of sonnets in Spanish, and our findings regarding metrics and enjambment. The corpus contains 4,085 sonnets by 1,204 Spanish and Latin American authors (15th to 19th centuries), encoded in TEI, with RDFa attributes. The corpus aims at breadth, including many peripheral authors besides some major ones. Author metadata were encoded (dates, origin, gender). Scansion and enjambment were annotated automatically, with the ADSO and ANJA tools. The range of authors and periods, the use of TEI and RDFa for interoperability, and the combination of metrical and enjambment annotations goes beyond previously available digital resources. The corpus allowed us to examine the evolution of metrical patterns and their combinations after the Golden Age, complementing earlier studies. We also observed an increase in enjambment across the tercets in the 19th century, which may indicate increased variety in the discourse organization of sonnets in the period.