Options
Correia Saavedra, David
Résultat de la recherche
Meaning differences between English clippings and their source words: A corpus-based study
2023, Martin Hilpert, David Correia Saavedra, Jennifer Rains
This paper uses corpus data and methods of distributional semantics in order to study English clippings such as dorm (< dormitory), memo (< memorandum), or quake (< earthquake). We investigate whether systematic meaning differences between clippings and their source words can be detected. The analysis is based on a sample of 50 English clippings. Each of the clippings is represented by a concordance of 100 examples in context that were gathered from the Corpus of Contemporary American English. We compare clippings and their source words both at the aggregate level and in terms of comparisons between individual clippings and their source words. The data show that clippings tend to be used in contexts that represent involved text production, which aligns with the idea that clipped words signal familiarity with their referents. It is further observed that individual clippings and their source words partly diverge in their distributional profiles, reflecting both overlap and differences with regard to their meanings. We interpret these findings against the theoretical background of Construction Grammar and specifically the Principle of No Synonymy.
Measurements of grammaticalization:: developing a quantitative index for the study of grammatical change
2019, Correia Saavedra, David, Hilpert, Martin, Petré, Peter
Il existe un large consensus sur le fait que la grammaticalisation est un processus graduel et largement unidirectionnel : les éléments lexicaux acquièrent des fonctions grammaticales et les éléments grammaticaux peuvent ensuite se grammaticaliser d’avantage (Hopper et Traugott 2003). Ce consensus est fondé sur des observations consistantes entre plusieurs langues. Alors que la plupart des études existantes présentent des données qualitatives, cette thèse propose une approche quantitative qui tente de mesurer les degrés de grammaticalisation à l'aide de variables basées sur des corpus, complémentant ainsi le travail qualitatif existant. La mesure proposée est calculée sur la base de plusieurs paramètres connus pour jouer un rôle dans la grammaticalisation (Lehmann 2002, Hopper 1991), tels que la fréquence (token), la longueur phonologique, la diversité des collocats, la diversité des colligats et la dispersion. Ces variables sont utilisées dans un modèle de régression logistique binaire qui peut attribuer un score à un élément linguistique donné qui reflète son degré de grammaticalisation. La grammaticalisation peut être conçue en synchronie et en diachronie. La vision synchronique de la grammaticalisation concerne le fait que certains éléments sont plus grammaticalisés que d'autres, ce que l'on appelle communément la gradience. La vue diachronique concerne le développement d'éléments grammaticaux au fil du temps, les éléments devenant de plus en plus grammaticaux par petits pas incrémentiels, ce qu'on appelle aussi la gradualité. Cette thèse propose des études qui traitent de chacun de ces points de vue. Pour quantifier la gradience de la grammaticalisation, des données du British National Corpus sont utilisées. 264 éléments lexicaux et 264 éléments grammaticaux sont sélectionnés pour entraîner un modèle de régression logistique binaire. Ce modèle peut classer ces éléments et déterminer ceux qui sont plus lexicaux ou plus grammaticaux. Les résultats indiquent que le modèle fait des prédictions réussies dans l'ensemble. De plus, des généralisations concernant la grammaticalisation peuvent être soutenues, comme la pertinence des variables clés (p.ex. la fréquence token et la diversité à gauche d'un élément donné) ou le classement des catégories morphosyntaxiques dans leur ensemble (p.ex. les adverbes se situent en moyenne entre les catégories lexicales et grammaticales). La gradualité de la grammaticalisation est étudiée à l'aide d'une sélection de vingt éléments qui ont des équivalents grammaticaux et lexicaux en anglais (p.ex. keep). Le Corpus of Historical American English (1810s-2000s) est utilisé pour récupérer les données pertinentes. L'objectif est de vérifier l'évolution dans le temps des différentes variables et des scores de grammaticalisation. La principale valeur théorique de cette approche est qu'elle peut offrir une façon empiriquement opérationnelle de mesurer l'unidirectionnalité en grammaticalisation, par opposition à une observation plus qualitative basée sur des études de cas individuels., There is a broad consensus that grammaticalization is a process that is gradual and largely unidirectional: lexical elements acquire grammatical functions, and grammatical elements can undergo further grammaticalization (Hopper and Traugott 2003). This consensus is based on substantial cross-linguistic evidence. While most existing studies present qualitative evidence, this dissertation discusses a quantitative approach that tries to measure degrees of grammaticalization using corpus-based variables, thereby complementing existing qualitative work. The proposed measurement is calculated on the basis of several parameters that are known to play a role in grammaticalization (Lehmann 2002, Hopper 1991), such as token frequency, phonological length, collocational diversity, colligate diversity, and dispersion. These variables are used in a binary logistic regression model which can assign a score to a given linguistic item that reflects its degree of grammaticalization. Grammaticalization can be conceived in synchrony and in diachrony. The synchronic view of grammaticalization is concerned with the fact that some items are more grammaticalized than others, which is commonly referred to as gradience. The diachronic view is concerned with the development of grammatical elements over time, whereby elements become increasingly more grammatical through small incremental steps, which is also known as gradualness. This dissertation proposes studies that deal with each of these views. In order to quantify the gradience of grammaticalization, data from the British National Corpus is used. 264 lexical and 264 grammatical elements are selected in order to train a binary logistic regression model. This model can rank these items and determine which ones are more lexical or more grammatical. The results indicate that the model makes successful predictions overall. In addition, generalizations regarding grammaticalization can be supported, such as the relevance of key variables (e.g. token frequency, diversity to the left of a given item) or the ranking of morphosyntactic categories as a whole (e.g. adverbs are on average in between the lexical and grammatical categories). The gradualness of grammaticalization is investigated using a selection of twenty elements that have grammatical and lexical counterparts in English (e.g. keep). The Corpus of Historical American English (1810s-2000s) is used to retrieve the relevant data. The aim is to check how the different variables and the grammaticalization scores develop over time. The main theoretical value of this approach is that it can offer an empirically operationalized way of measuring unidirectionality in grammaticalization, as opposed to a more qualitative observation based on individual case studies., Er bestaat een brede consensus dat grammaticalisatie als proces geleidelijk en grotendeels in één richting verloopt: lexicale elementen verwerven grammaticale functies, en grammaticale elementen kunnen verdere grammaticalisatie ondergaan (Hopper en Traugott 2003). Deze consensus is gebaseerd op substantieel cross-linguïstisch bewijs. Waar echter de meeste bestaande studies kwalitatief bewijs leveren, werkt dit proefschrift een kwantitatieve benadering uit die de graad van grammaticalisatie tracht te meten aan de hand van corpusgebaseerde variabelen, complementair aan bestaand kwalitatief werk. De voorgestelde maat wordt berekend op basis van verscheidene parameters waarvan bekend is dat ze een rol spelen in grammaticalisatie (Lehmann 2002, Hopper 1991), zoals tokenfrequentie, fonologische lengte, collocationele diversiteit, colligationele diversiteit, en spreidingsgraad. Deze variabelen worden gebruikt in een binair logistisch regressiemodel dat een score kan toekennen aan een specifiek taalitem, die de grammaticalisatiegraad ervan weerspiegelt. Grammaticalisatie kan sychroon of diachroon worden opgevat. Het synchrone perspectief betreft het gegeven dat sommige items meer gegrammaticaliseerd zijn dan andere, een gegeven dat algemeen bekend staat als de grammaticalisatiegradiënt. Het diachrone perspectief heeft betrekking op de ontwikkeling van grammaticale elementen in de loop der tijd, waarbij elementen in steeds grammaticaler worden door kleine incrementele stapjes, een fenomeen dat bekend staat als de geleidelijkheid van grammaticalisatie. Dit proefschrift presenteert studies die elk van deze perspectieven omvatten. Om de grammaticalisatiegradiënt te kunnen kwantificeren, wordt gebruik gemaakt van data uit het British National Corpus. 264 lexicale en 264 grammaticale elementen werden geselecteerd om een binair logistisch model te trainen. Dit model kan deze items rangschikken en bepalen welke eerder lexicaal zijn en welke eerder grammaticaal. De resultaten geven aan dat het model over het geheel genomen succesvolle voorspellingen maakt. Bovendien bieden ze steun aan generalisaties met betrekking tot grammaticalisatie, zoals de relevantie van sleutelvariabelen (bv. tokenfrequentie, diversiteit links van een item) of de rangorde van morfosyntactische categorieën in haar geheel (bv. bijwoorden situeren zich gemiddeld genomen tussen de lexicale en grammaticale categorieën). De geleidelijkheid van grammaticalisatie wordt onderzocht aan de hand van een selectie van twintig elementen met grammaticale en lexicale tegenhangers in het Engels (bv. keep). Het Corpus of Historical American English (1810s-2000s) werd gebruikt om de relevante data op te halen. Het doel is om na te gaan hoe de verschillende variabelen en grammaticalisatiescores zich ontwikkelen over de tijd. De belangrijkste theoretische waarde van deze benadering is dat ze een empirisch geoperationaliseerde manier kan bieden om unidirectionaliteit in grammaticalisatie te meten, tegenover een meer kwalitatieve observatie gebaseerd op individuele case studies.
Why are grammatical elements more evenly dispersed than lexical elements? Assessing the roles of text frequency and semantic generality
2017, Hilpert, Martin, Correia Saavedra, David
Grammatical elements such as determiners, conjunctions or pronouns are very evenly dispersed across natural language data. By contrast, the uses of lexical elements have a stronger tendency to occur in bursts that are interspersed by long lulls. This paper considers two alternative explanations for this difference. First, it could be hypothesised that the more even distribution of grammatical elements is merely an effect of an element’s high text frequency. Alternatively, it could be argued that a more even distribution is a symptom of greater generality in meaning. In order to assess the impact of both frequency and semantic generality, we conducted a corpus-based study that contrasts lexical and grammatical elements in Present-Day English. Our results suggest that evenness of dispersion is chiefly an effect of high frequency.
Investigating English clippings experimentally:
2023, Martin Hilpert, David Correia Saavedra, Jennifer Rains
The unidirectionality of semantic changes in grammaticalization: an experimental approach to the asymmetric priming hypothesis
2018, Hilpert, Martin, Correia Saavedra, David
Why is semantic change in grammaticalization typically unidirectional? It is a well-established finding that in grammaticalizing constructions, more concrete meanings tend to evolve into more schematic meanings. Jäger & Rosenbach (2008) appeal to the psychological phenomenon of asymmetric priming in order to explain this tendency. This article aims to evaluate their proposal on the basis of experimental psycholinguistic evidence. Asymmetric priming is a pattern of cognitive association in which one idea strongly evokes another (i.e. paddle strongly evokes water), while that second idea does not evoke the first one with the same force (water only weakly evokes paddle). Asymmetric priming would elegantly explain why semantic change in grammaticalization tends to be unidirectional, as in the case of English be going to, which has evolved out of the lexical verb go. As yet, empirical engagement with Jäger & Rosenbach's hypothesis has been limited. We present experimental evidence from a maze task (Forster et al. 2009), in which we test whether asymmetric priming obtains between lexical forms (such as go) and their grammaticalized counterparts (be going to). On the asymmetric priming hypothesis, the former should prime the latter, but not vice versa. Contrary to the hypothesis, we observe a negative priming effect: speakers who have recently been exposed to a lexical element are significantly slower to process its grammaticalized variant. We interpret this observation as a horror aequi phenomenon (Rohdenburg & Mondorf 2003).
Give me your name and I'll tell you whether you speak with an accent: The effect of proper names ethnicity on listener expectations
2016-12-5, Prikhodkine, Alexei, Correia Saavedra, David, Dos Santos Mamed, Marcelo
The mastery of a national language tends to be regarded as a key element in foreigners’ integration in Switzerland and as a gateway to equal opportunity. In this article, the limitations of this claim are explored through a study measuring the effect of proper names’ ethnicity on speech perception. A hundred and fifty Swiss respondents had to rate six speakers who were presented as candidates for a job as a communication manager in a Swiss bank. These six speakers spent most of their lives in French-speaking Switzerland and spoke the Standard variety. Our findings indicate that a proper name with an ethnic minority component can result in their bearers being judged as having more foreign accent and as being less suitable for the job. Results are discussed in terms of a discrepancy between cultural nationality and legal citizenship in modern nation-states. This article also shows that studying the effect of proper names, and more generally fine-grained non-verbal cues, on speech perception is a promising research domain in the sociolinguistics of migration, as it provides us with a multi-dimensional appreciation of ethnic identities.
A multivariate approach to English Clippings
2021-9-30, Correia Saavedra, David
This paper addresses the morphological word formation process that is known as clipping. In English, that process yields shortened word forms such as lab (< laboratory), exam (< examination), or gator (< alligator). It is frequently argued (Davy 2000, Durkin 2009, Haspelmath & Sims 2010, Don 2014) that clipping is highly variable and that it is difficult to predict how a given source word will be shortened. We draw on recent work (Lappe 2007, Jamet 2009, Berg 2011, Alber & Arndt-Lappe 2012, Arndt-Lappe 2018) in order to challenge that view. Our main hypothesis is that English clipping follows predictable tendencies, that these tendencies can be captured by a probabilistic, multifactorial model, and that the features of that model can be explained functionally in terms of cognitive, discourse-pragmatic, and phonological factors. Cognitive factors include the principle of least effort (Zipf 1949), an important discourse-pragmatic factor is the recoverability of the source word (Tournier 1985), and phonological factors include issues of stress and syllable structure (Lappe 2007). While the individual influence of these factors on clipping has been recognized, their interaction and their relative importance remains to be fully understood. The empirical analysis in this paper will use Hierarchical Configural Frequency Analysis (Krauth & Lienert 1973, Gries 2008) on the basis of a large, newly compiled database of more than 2000 English clippings. Our analysis allows us to detect regularities in the way speakers of English create clippings. We argue that there are several English clipping schemas that are optimized for processability.
Using token-based semantic vector spaces for corpus-linguistic analyses: From practical applications to tests of theoretical claims
2017, Hilpert, Martin, Correia Saavedra, David
This paper presents token-based semantic vector spaces as a tool that can be applied in corpus-linguistic analyses such as word sense comparisons, comparisons of synonymous lexical items, and matching of concordance lines with a given text. We demonstrate how token-based semantic vector spaces are created, and we illustrate the kinds of result that can be obtained with this approach. Our main argument is that token-based semantic vector spaces are not only useful for practical corpus-linguistic applications but also for the investigation of theory-driven questions. We illustrate this point with a discussion of the asymmetric priming hypothesis (Jäger and Rosenbach 2008). The asymmetric priming hypothesis, which states that grammaticalizing constructions will be primed by their lexical sources but not vice versa, makes a number of empirically testable predictions. We operationalize and test these predictions, concluding that token-based semantic vector spaces yield conclusions that are relevant for linguistic theory-building.