Who Wrote this Novel? Authorship Attribution across Three Languages
Date de parution
Revue Tranel (Travaux neuchâtelois de linguistique), Institut des sciences du langage et de la communication, Université de Neuchâtel, 2011/55//59-75
Based on different writing style definitions, various authorship attribution schemes have been proposed to identify the real author of a given text or text excerpt. In this article we analyze the relative performance of word types or lemmas assigned to re-present styles and texts. As a second objective we compare two authorship attribu-tion approaches, one based on principal component analysis (PCA), and a new au-thorship attribution method involving specific vocabulary (Z score classification scheme). As a third goal we carry out our experiments on data from three corpora written in three different languages (English, French, and German). In the first we ca-tegorize 52 text excerpts (taken from 19th century English novels) written by nine au-thors. In the second we work with 44 segments taken from French novels (mainly 19th century) written by eleven authors. In the third we extract 59 German text excerpts written by 15 authors and covering the 19th and early 20th centuries. Based on these collections and two specific features (word types or lemmas) we demonstrate that the Z score method performs better than the PCA, while demonstrating that lemmas tend to produce slightly better performance than word types.
Type de publication
journal article