Identifying Emerging Concepts in Large Corpora
–arXiv.org Artificial Intelligence
We introduce a new method to identify emerging concepts in large text corpora. By analyzing changes in the heatmaps of the underlying embedding space, we are able to detect these concepts with high accuracy shortly after they originate, in turn outperforming common alternatives. We further demonstrate the utility of our approach by analyzing speeches in the U.S. Senate from 1941 to 2015. Our results suggest that the minority party is more active in introducing new concepts into the Senate discourse. We also identify specific concepts that closely correlate with the Senators' racial, ethnic, and gender identities. An implementation of our method is publicly available.
arXiv.org Artificial Intelligence
Feb-28-2025
- Country:
- Africa > Middle East (0.04)
- Asia
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Eastern Europe (0.04)
- Middle East (0.04)
- Russia (0.04)
- Spain (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Italy > Tuscany
- Florence (0.04)
- Germany > Berlin (0.04)
- Belgium > Brussels-Capital Region
- North America
- Central America (0.04)
- Grenada (0.04)
- Puerto Rico (0.04)
- United States
- California > Santa Clara County
- Palo Alto (0.04)
- Indiana (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Oregon (0.04)
- California > Santa Clara County
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government
- Foreign Policy (1.00)
- Military (1.00)
- Regional Government > North America Government
- United States Government (1.00)
- Health & Medicine (1.00)
- Law
- Civil Rights & Constitutional Law (1.00)
- Environmental Law (1.00)
- Statutes (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Government
- Technology: