From Outliers to Topics in Language Models: Anticipating Trends in News Corpora
Zve, Evangelia, Icard, Benjamin, Breton, Alice, Sainero, Lila, Bourgne, Gauvain, Ganascia, Jean-Gabriel
–arXiv.org Artificial Intelligence
This paper examines how outliers, often dismissed as noise in topic modeling, can act as weak signals of emerging topics in dynamic news corpora. Using vector embeddings from state-of-the-art language models and a cumulative clustering approach, we track their evolution over time in French and English news datasets focused on corporate social responsibility and climate change. The results reveal a consistent pattern: outliers tend to evolve into coherent topics over time across both models and languages.
arXiv.org Artificial Intelligence
Sep-29-2025
- Country:
- Asia (0.46)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Media > News (0.93)
- Social Sector (0.75)
- Technology: