The information value of textual sentiment in economics and finance
Directeur de la thèse |
David Ardia, Kris Boudt |
Résumé |
This research project contributes to the analysis of sentiment in
texts by asking whether “a deeper understanding of textual
sentiment maximizes its information value”. A wealth of various
forms of texts written on connected economic and financial topics
have become cheaply available at large scale, presenting an
increasingly important driver of the economy. Texts express either
a positive, a negative or a neutral tone, textual sentiment,
providing incremental information to quantitative data. Natural
language processing and machine learning techniques have made
significant progress in associating sentiment to texts. What lacks,
however, is a decomposition of sentiment, knowledge about the
underlying multivariate process of how sentiment is generated, and
a framework for accurate statistical analysis. Understanding
sentiment at a deeper level is key to grasping the full information
potential of the big data sets texts are, with a focus on news
articles and corporate disclosures. I put forward three research questions, that each target a distinct feature of textual sentiment. First, I ask if the dimensionality of sentiment and its evolution provides supplementary information as opposed to single sentiment estimates. Second, I question if a more precise estimation of textual sentiment improves its informativeness. Third, I research the added value of linking the cyclicality of textual sentiment to other cyclical variables. The overarching hypothesis states that accounting for the dimensionality, precision, and cyclicality of textual sentiment enhances its information value. I develop a three-layered framework called “Sentometrics” to address the research questions, intertwined with a desire to obtain a deeper understanding of textual sentiment. The first layer consists of a dynamic factor model that reveals the common components of textual sentiment. The second layer is a time-varying stochastic model underlying the generation of positive and negative words in texts. Such a parametric approach is entirely new to the textual sentiment literature. The third layer integrates textual sentiment into a regime-switching model. To validate the main hypothesis, I will deploy this enhanced textual sentiment modeling in three specific economic and financial applications, focusing on the supplementary information value that is obtained. The validation exercises will cover respectively the prediction of macroeconomic indicators, trading performance, and covariance matrices. We analyze extensive firm-specific textual data, that is, media coverage, annual reports, and quarterly earnings press releases. A successful development of the “Sentometrics” toolbox fills in the gap of missing econometric tools to decipher textual sentiment, making it possible to extract and exploit more detailed information on the sources, dynamics, and uncertainty of sentiment in texts. Our applications may improve risk protection of companies, banks, and financial institutions, potentially increasing the stability of the financial markets and the economy. As the modeling is set up application-free, it can be used in multiple subsequent research settings, including marketing, politics, and web intelligence, effectively providing a means for many fields to utilize textual sentiment as an optimally informative variable. |
Mots-clés |
Time series, econometrics, text mining, sentiment analysis. |
Type de projet | Recherche de thèse |
Domaine de recherche | sentiment analysis, time series econometrics |
Source de financement | SNF |
Etat | En cours |
Début de projet | 1-9-2018 |
Fin du projet | 31-8-2022 |
Contact | Samuel Borms |