Analysing the Impact of Removing Infrequent Words on Topic Quality in LDA Models
Bystrov, Victor, Naboka-Krell, Viktoriia, Staszewska-Bystrova, Anna, Winker, Peter
–arXiv.org Artificial Intelligence
The use of topic modelling techniques, especially Latent Dirichlet Allocation (LDA) introduced by Blei et al. (2003), is growing fast. The methods find application in a broad variety of domains. In text-as-data applications, LDA enables the analysis of large collections of text in an unsupervised manner by uncovering latent structures behind the data. Given this increasing use of LDA as a standard tool for empirical analysis, also the interest in details of the method and, in particular, in parameter settings for its implementation is rising. Thus, since the introduction of the LDA approach in 2003 by Blei et al., different methodological components of LDA have already been studied in more detail as, for example, the choice of the number of topics (Cao et al., 2009; Mimno et al., 2011; Lewis and Grossetti, 2022; Bystrov et al., 2022a), hyper-parameter settings (Wallach et al., 2009), model design (e.g.
arXiv.org Artificial Intelligence
Nov-24-2023
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe
- Germany (0.04)
- Poland > Łódź Province
- Łódź (0.05)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- United Kingdom > Scotland
- City of Edinburgh > Edinburgh (0.04)
- North America > United States
- Florida > Palm Beach County
- Boca Raton (0.04)
- New York > New York County
- New York City (0.04)
- Florida > Palm Beach County
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Banking & Finance (0.46)
- Government (0.68)
- Technology: