The information value of textual sentiment in economics and finance
Thesis director David Ardia,
Kris Boudt
Abstract This research project contributes to the analysis of sentiment in texts by asking whether “a deeper understanding of textual sentiment maximizes its information value”. A wealth of various forms of texts written on connected economic and financial topics have become cheaply available at large scale, presenting an increasingly important driver of the economy. Texts express either a positive, a negative or a neutral tone, textual sentiment, providing incremental information to quantitative data. Natural language processing and machine learning techniques have made significant progress in associating sentiment to texts. What lacks, however, is a decomposition of sentiment, knowledge about the underlying multivariate process of how sentiment is generated, and a framework for accurate statistical analysis. Understanding sentiment at a deeper level is key to grasping the full information potential of the big data sets texts are, with a focus on news articles and corporate disclosures.

I put forward three research questions, that each target a distinct feature of textual sentiment. First, I ask if the dimensionality of sentiment and its evolution provides supplementary information as opposed to single sentiment estimates. Second, I question if a more precise estimation of textual sentiment improves its informativeness. Third, I research the added value of linking the cyclicality of textual sentiment to other cyclical variables. The overarching hypothesis states that accounting for the dimensionality, precision, and cyclicality of textual sentiment enhances its information value.

I develop a three-layered framework called “Sentometrics” to address the research questions, intertwined with a desire to obtain a deeper understanding of textual sentiment. The first layer consists of a dynamic factor model that reveals the common components of textual sentiment. The second layer is a time-varying stochastic model underlying the generation of positive and negative words in texts. Such a parametric approach is entirely new to the textual sentiment literature. The third layer integrates textual sentiment into a regime-switching model. To validate the main hypothesis, I will deploy this enhanced textual sentiment modeling in three specific economic and financial applications, focusing on the supplementary information value that is obtained. The validation exercises will cover respectively the prediction of macroeconomic indicators, trading performance, and covariance matrices. We analyze extensive firm-specific textual data, that is, media coverage, annual reports, and quarterly earnings press releases.

A successful development of the “Sentometrics” toolbox fills in the gap of missing econometric tools to decipher textual sentiment, making it possible to extract and exploit more detailed information on the sources, dynamics, and uncertainty of sentiment in texts. Our applications may improve risk protection of companies, banks, and financial institutions, potentially increasing the stability of the financial markets and the economy. As the modeling is set up application-free, it can be used in multiple subsequent research settings, including marketing, politics, and web intelligence, effectively providing a means for many fields to utilize textual sentiment as an optimally informative variable.
Keywords Time series, econometrics, text mining, sentiment analysis.
Type of project Dissertation project
Research area sentiment analysis, time series econometrics
Method of financing SNF
Status Ongoing
Start of project 1-9-2018
End of project 31-8-2022
Contact Samuel Borms