Optimal Text-Based Time-Series Indices
–arXiv.org Artificial Intelligence
This integration is typically done by (i) selecting, (ii) transforming, and (iii) aggregating textual content into a time-series representation (see Ardia et al., 2019; Algaba et al., 2020, for a general overview of these steps). While many studies have focused on steps (ii) and (iii)-- transforming and aggregating textual data into a quantitative measure such as sentiment (see e.g., Loughran and McDonald, 2014; Jegadeesh and Wu, 2013; Manela and Moreira, 2017)--the essential selection step (i), which usually relies on subjective ad-hoc rules, has not received much attention yet. We aim to fill this gap in this article by proposing an approach to construct text-based time-series indices optimally. Specifically, our algorithm determines which set of texts, among a large corpus, leads to a text-based index that is optimal for a specific objective--typically, an index that maximizes the contemporaneous relation or the predictive performance with respect to a target variable, such as inflation. Our methodology relies on binary selection matrices that, applied to the vocabulary of tokens, select the relevant texts in the corpus.
arXiv.org Artificial Intelligence
May-16-2024
- Country:
- North America
- Canada > Quebec
- Estrie Region > Sherbrooke (0.04)
- Montreal (0.04)
- United States > Michigan (0.04)
- Canada > Quebec
- North America
- Genre:
- Research Report (1.00)
- Industry:
- Banking & Finance > Economy (1.00)
- Government (1.00)
- Media > News (0.68)
- Technology: