Determination of the Number of Topics Intrinsically: Is It Possible?
Bulatov, Victor, Alekseev, Vasiliy, Vorontsov, Konstantin
–arXiv.org Artificial Intelligence
The number of topics might be the most important parameter of a topic model. The topic modelling community has developed a set of various procedures to estimate the number of topics in a dataset, but there has not yet been a sufficiently complete comparison of existing practices. This study attempts to partially fill this gap by investigating the performance of various methods applied to several topic models on a number of publicly available corpora. Further analysis demonstrates that intrinsic methods are far from being reliable and accurate tools. The number of topics is shown to be a method- and a model-dependent quantity, as opposed to being an absolute property of a particular corpus. We conclude that other methods for dealing with this problem should be developed and suggest some promising directions for further research.
arXiv.org Artificial Intelligence
Jun-14-2024
- Country:
- Europe
- United Kingdom (0.14)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Asia
- Russia (0.04)
- Middle East > Jordan (0.04)
- Europe
- Genre:
- Research Report > New Finding (0.46)
- Technology: