Topics as Entity Clusters: Entity-based Topics from Language Models and Graph Neural Networks
Loureiro, Manuel V., Derby, Steven, Wijaya, Tri Kurniawan
–arXiv.org Artificial Intelligence
Topic models aim to reveal the latent structure behind a corpus, typically conducted over a bag-of-words representation of documents. In the context of topic modeling, most vocabulary is either irrelevant for uncovering underlying topics or contains strong relationships with relevant concepts, impacting the interpretability of these topics. Furthermore, their limited expressiveness and dependency on language demand considerable computation resources. Hence, we propose a novel approach for cluster-based topic modeling that employs conceptual entities. Entities are language-agnostic representations of real-world concepts rich in relational information. To this end, we extract vector representations of entities from (i) an encyclopedic corpus using a language model; and (ii) a knowledge base using a graph neural network. We demonstrate that our approach consistently outperforms other state-of-the-art topic models across coherency metrics and find that the explicit knowledge encoded in the graph-based embeddings provides more coherent topics than the implicit knowledge encoded with the contextualized embeddings of language models.
arXiv.org Artificial Intelligence
Jan-6-2023
- Country:
- North America > United States
- New York > New York County > New York City (0.05)
- Europe
- Ireland (0.04)
- United Kingdom
- England (0.04)
- Scotland > City of Edinburgh
- Edinburgh (0.04)
- Italy > Tuscany
- Florence (0.04)
- Czechia > South Moravian Region
- Brno (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- China > Hong Kong (0.04)
- British Indian Ocean Territory > Diego Garcia (0.04)
- Middle East
- North America > United States
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Information Technology (0.93)
- Automobiles & Trucks (0.68)
- Leisure & Entertainment > Sports
- Boxing (1.00)
- Technology: