AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis
Khodorchenko, Maria, Butakov, Nikolay, Zuev, Maxim, Nasonov, Denis
–arXiv.org Artificial Intelligence
Topic modeling is a well-known technique for modeling the internal structure of a text corpora, represented as a set of interrelated word sets known as topics. Starting from Latent Semantic Allocation (LSA) [1] and Non-negative Matrix Factorization (NMF) [2] to probabilistic and neural approach, topic modeling proved to be a valuable tool to solve a range of practical tasks [3, 4]. One of the key features of topic modeling lays in the interpretability of resulting representations, that enables easier comprehension of complex datasets and helps in meaningful insights extraction. To be useful, topic models should be flexible enough to model various corpora of different nature, origin, and language. Which requires the model to be carefully tuned for the corpora in consideration at the moment, and usually is closely connected with the amount of hyperparameters the model has. This is especially true for additively regularized topic models that represent semi-probabilistic group of methods revealing great adaptability, but requiring setting a high number of parameters and expertise to do that properly. This paper presents AutoTM 2.0 framework that allow effective usage of additively regularized models, as they provide the most flexible way to process datasets with different statistical characteristics. Our main contributions can be summarized as follows: significant simplification of the use of flexible additively regularized models by offering automatic singleobjective optimization procedures. Offering metrics that closely align with human judgment.
arXiv.org Artificial Intelligence
Oct-1-2024
- Country:
- South America > Paraguay
- North America > United States
- New York > New York County
- New York City (0.05)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California > San Francisco County
- San Francisco (0.14)
- New York > New York County
- Europe
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Russia > Northwestern Federal District
- Leningrad Oblast > Saint Petersburg (0.05)
- Italy > Tuscany
- Florence (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Spain > Catalonia
- Asia
- Genre:
- Research Report (1.00)
- Technology: