The Hype Index: an NLP-driven Measure of Market News Attention

Cao, Zheng, Wunkaew, Wanchaloem, Geman, Helyette

arXiv.org Artificial Intelligence 

Natural Language Processing (NLP) has become an increasingly powerful tool in finance, transforming how researchers and practitioners extract predictive signals from unstructured text. With the rise of real-time news feeds and scalable NLP models, media content now plays a central role in market forecasting, risk management, and behavioral analysis. This paper contributes to that growing body of literature by introducing a novel framework for measuring media-driven attention in equities: the Hype Index. Our approach begins with the construction of a News Count-Based Hype Index, which quantifies the relative media exposure of each stock or sector by calculating its share of daily financial news coverage within the S&P 100 universe. This measure captures how disproportionately a given asset appears in financial media, independent of its economic footprint. To address size-related bias and better isolate disproportionate attention, we introduce the Capitalization Adjusted Hype Index. Defined as the ratio of a stock's or sector's news count weight to its market capitalization weight within its peer cluster, this adjusted index reflects deviations from a benchmark of proportionality. In doing so, it highlights assets that receive media attention in excess of what would be expected based on their economic size.