Causal Reconstruction of Sentiment Signals from Sparse News Data
Stan, Stefania, Lunghi, Marzio, Vargetto, Vito, Ricci, Claudio, Repetto, Rolands, Leo, Brayden, Gan, Shao-Hong
Sentiment signals derived from sparse news are commonly used in financial analysis and technology monitoring, yet transforming raw article-level observations into reliable temporal series remains a largely unsolved engineering problem. Rather than treating this as a classification challenge, we propose to frame it as a causal signal reconstruction problem: given probabilistic sentiment outputs from a fixed classifier, recover a stable latent sentiment series that is robust to the structural pathologies of news data such as sparsity, redundancy, and classifier uncertainty. We present a modular three-stage pipeline that (i) aggregates article-level scores onto a regular temporal grid with uncertainty-aware and redundancy-aware weights, (ii) fills coverage gaps through strictly causal projection rules, and (iii) applies causal smoothing to reduce residual noise. Because ground-truth longitudinal sentiment labels are typically unavailable, we introduce a label-free evaluation framework based on signal stability diagnostics, information preservation lag proxies, and counterfactual tests for causality compliance and redundancy robustness. As a secondary external check, we evaluate the consistency of reconstructed signals against stock-price data for a multi-firm dataset of AI-related news titles (November 2024 to February 2026). The key empirical finding is a three-week lead lag pattern between reconstructed sentiment and price that persists across all tested pipeline configurations and aggregation regimes, a structural regularity more informative than any single correlation coefficient. Overall, the results support the view that stable, deployable sentiment indicators require careful reconstruction, not only better classifiers.
Mar-26-2026
- Country:
- Asia > Singapore (0.04)
- Europe
- Switzerland (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- North America > Trinidad and Tobago
- Genre:
- Research Report
- Experimental Study (0.46)
- New Finding (0.46)
- Research Report
- Industry:
- Banking & Finance > Trading (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Issues > Social & Ethical Issues (0.46)
- Machine Learning > Statistical Learning (0.46)
- Natural Language
- Discourse & Dialogue (0.68)
- Information Extraction (0.68)
- Representation & Reasoning (0.93)
- Communications > Social Media (0.93)
- Data Science (0.67)
- Artificial Intelligence
- Information Technology