The information value of textual sentiment in economics and finance
Directeur de la thèse David Ardia,
Kris Boudt
Résumé This research project contributes to the analysis of sentiment in texts by asking whether “a deeper understanding of textual sentiment maximizes its information value”. A wealth of various forms of texts written on connected economic and financial topics have become cheaply available at large scale, presenting an increasingly important driver of the economy. Texts express either a positive, a negative or a neutral tone, textual sentiment, providing incremental information to quantitative data. Natural language processing and machine learning techniques have made significant progress in associating sentiment to texts. What lacks, however, is a decomposition of sentiment, knowledge about the underlying multivariate process of how sentiment is generated, and a framework for accurate statistical analysis. Understanding sentiment at a deeper level is key to grasping the full information potential of the big data sets texts are, with a focus on news articles and corporate disclosures.

I put forward three research questions, that each target a distinct feature of textual sentiment. First, I ask if the dimensionality of sentiment and its evolution provides supplementary information as opposed to single sentiment estimates. Second, I question if a more precise estimation of textual sentiment improves its informativeness. Third, I research the added value of linking the cyclicality of textual sentiment to other cyclical variables. The overarching hypothesis states that accounting for the dimensionality, precision, and cyclicality of textual sentiment enhances its information value.

I develop a three-layered framework called “Sentometrics” to address the research questions, intertwined with a desire to obtain a deeper understanding of textual sentiment. The first layer consists of a dynamic factor model that reveals the common components of textual sentiment. The second layer is a time-varying stochastic model underlying the generation of positive and negative words in texts. Such a parametric approach is entirely new to the textual sentiment literature. The third layer integrates textual sentiment into a regime-switching model. To validate the main hypothesis, I will deploy this enhanced textual sentiment modeling in three specific economic and financial applications, focusing on the supplementary information value that is obtained. The validation exercises will cover respectively the prediction of macroeconomic indicators, trading performance, and covariance matrices. We analyze extensive firm-specific textual data, that is, media coverage, annual reports, and quarterly earnings press releases.

A successful development of the “Sentometrics” toolbox fills in the gap of missing econometric tools to decipher textual sentiment, making it possible to extract and exploit more detailed information on the sources, dynamics, and uncertainty of sentiment in texts. Our applications may improve risk protection of companies, banks, and financial institutions, potentially increasing the stability of the financial markets and the economy. As the modeling is set up application-free, it can be used in multiple subsequent research settings, including marketing, politics, and web intelligence, effectively providing a means for many fields to utilize textual sentiment as an optimally informative variable.
Mots-clés Time series, econometrics, text mining, sentiment analysis.
Type de projet Recherche de thèse
Domaine de recherche sentiment analysis, time series econometrics
Source de financement SNF
Etat En cours
Début de projet 1-9-2018
Fin du projet 31-8-2022
Contact Samuel Borms