AITopics | lda

Collaborating Authors

lda

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exponential Family Discriminant Analysis: Generalizing LDA-Style Generative Classification to Non-Gaussian Models

Lakkapragada, Anish

arXiv.org Machine LearningMar-25-2026

We introduce Exponential Family Discriminant Analysis (EFDA), a unified generative framework that extends classical Linear Discriminant Analysis (LDA) beyond the Gaussian setting to any member of the exponential family. Under the assumption that each class-conditional density belongs to a common exponential family, EFDA derives closed-form maximum-likelihood estimators for all natural parameters and yields a decision rule that is linear in the sufficient statistic, recovering LDA as a special case and capturing nonlinear decision boundaries in the original feature space. We prove that EFDA is asymptotically calibrated and statistically efficient under correct specification, and we generalise it to $K \geq 2$ classes and multivariate data. Through extensive simulation across five exponential-family distributions (Weibull, Gamma, Exponential, Poisson, Negative Binomial), EFDA matches the classification accuracy of LDA, QDA, and logistic regression while reducing Expected Calibration Error (ECE) by $2$-$6\times$, a gap that is structural: it persists for all $n$ and across all class-imbalance levels, because misspecified models remain asymptotically miscalibrated. We further prove and empirically confirm that EFDA's log-odds estimator approaches the Cramér-Rao bound under correct specification, and is the only estimator in our comparison whose mean squared error converges to zero. Complete derivations are provided for nine distributions. Finally, we formally verify all four theoretical propositions in Lean 4, using Aristotle (Harmonic) and OpenGauss (Math, Inc.) as proof generators, with all outputs independently machine-checked by AXLE (Axiom).

artificial intelligence, efda, machine learning, (17 more...)

arXiv.org Machine Learning

2603.20655

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.91)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A Reduction for Efficient LDA Topic Reconstruction

Matteo Almanza, Flavio Chierichetti, Alessandro Panconesi, Andrea Vattani

Neural Information Processing SystemsFeb-15-2026, 03:00:05 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, reconstruction, topic reconstruction, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Italy > Lazio > Rome (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.34)

Add feedback

54ebdfbbfe6c31c39aaba9a1ee83860a-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-12-2026, 05:21:08 GMT

assumption, lda, logistic lda, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.33)

Add feedback

Discriminative Topic Modeling with Logistic LDA

Neural Information Processing SystemsDec-25-2025, 09:51:26 GMT

Despite many years of research into latent Dirichlet allocation (LDA), applying LDA to collections of non-categorical items is still challenging for practitioners. Yet many problems with much richer data share a similar structure and could benefit from the vast literature on LDA. We propose logistic LDA, a novel discriminative variant of latent Dirichlet allocation which is easy to apply to arbitrary inputs. In particular, our model can easily be applied to groups of images, arbitrary text embeddings, or integrate deep neural networks. Although it is a discriminative model, we show that logistic LDA can learn from unlabeled data in an unsupervised manner by exploiting the group structure present in the data. In contrast to other recent topic models designed to handle arbitrary inputs, our model does not sacrifice the interpretability and principled motivation of LDA.

discriminative topic modeling, lda, name change, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

A Reduction for Efficient LDA Topic Reconstruction

Matteo Almanza, Flavio Chierichetti, Alessandro Panconesi, Andrea Vattani

Neural Information Processing SystemsNov-20-2025, 20:58:54 GMT

We present a novel approach for LDA (Latent Dirichlet Allocation) topic reconstruction.

algorithm, artificial intelligence, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Italy > Lazio > Rome (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.34)

Add feedback

Quantifying consistency and accuracy of Latent Dirichlet Allocation

Magsarjav, Saranzaya, Humphries, Melissa, Tuke, Jonathan, Mitchell, Lewis

arXiv.org Artificial IntelligenceNov-18-2025

Topic modelling in Natural Language Processing uncovers hidden topics in large, unlabelled text datasets. It is widely applied in fields such as information retrieval, content summarisation, and trend analysis across various disciplines. However, probabilistic topic models can produce different results when rerun due to their stochastic nature, leading to inconsistencies in latent topics. Factors like corpus shuffling, rare text removal, and document elimination contribute to these variations. This instability affects replicability, reliability, and interpretation, raising concerns about whether topic models capture meaningful topics or just noise. To address these problems, we defined a new stability measure that incorporates accuracy and consistency and uses the generative properties of LDA to generate a new corpus with ground truth. These generated corpora are run through LDA 50 times to determine the variability in the output. We show that LDA can correctly determine the underlying number of topics in the documents. We also find that LDA is more internally consistent, as the multiple reruns return similar topics; however, these topics are not the true topics.

artificial intelligence, natural language, similarity measure, (16 more...)

arXiv.org Artificial Intelligence

2511.1285

Country: Oceania > Australia (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Evaluating BERTopic on Open-Ended Data: A Case Study with Belgian Dutch Daily Narratives

Kandala, Ratna, Vanhasbroeck, Niels, Hoemann, Katie

arXiv.org Artificial IntelligenceNov-12-2025

While traditional probabilistic models such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003) have been foundational, their underlying bag - of - words assumption limits their ability to capture complex semantics. A recent paradigm shift towards models like BERTopic (Grootendorst, 2022), a state - of - the - art (SOTA) model which leverages contextualized embeddings from pre - trained transformers, has shown significant promise in generating more semantically coherent topics. These models can capture nuanced relationships, including domain - speci fic named entities and morphologically rich constructs, critical for linguistically complex data. However, despite this progress, two significant gaps persist in literature. First, research has overwhelmingly focused on high - resource, standardized languages, with a lot of scope left for under - resourced languages to be unexplored. This focus not only limits the generalizability of existing models but also risks perp etuating a technological bias where the nuances of smaller linguistic communities are overlooked. Models trained on standard corpora often fail to capture the unique lexical and semantic patterns of regional dialects or sociolects, leading to a superficial or even inaccurate understanding of the underlying discourse (Kamilo g lu, 2025) . Second, the predominant application domain has been structured or short - form text like news articles or social media posts (Egger et al., 2022; Schäfer et al., 2024), while the challenges of modeling unstructured, open - ended personal narratives have received less attention. Distinct from the short - form, often decontextualized nature of social media data, daily narratives provide granular, contextually - grounded accounts of lived experience.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.14707

Country:

Europe (1.00)
North America > United States (0.68)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.94)

Industry:

Information Technology > Security & Privacy (0.93)
Health & Medicine (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Topic Analysis with Side Information: A Neural-Augmented LDA Approach

Fang, Biyi, Vo, Truong, Rajshekhar, Kripa, Klabjan, Diego

arXiv.org Machine LearningNov-4-2025

Traditional topic models such as Latent Dirichlet Allocation (LDA) have been widely used to uncover latent structures in text corpora, but they often struggle to integrate auxiliary information such as metadata, user attributes, or document labels. These limitations restrict their expressiveness, personalization, and interpretability. To address this, we propose nnLDA, a neural-augmented probabilistic topic model that dynamically incorporates side information through a neural prior mechanism. nnLDA models each document as a mixture of latent topics, where the prior over topic proportions is generated by a neural network conditioned on auxiliary features. This design allows the model to capture complex nonlinear interactions between side information and topic distributions that static Dirichlet priors cannot represent. We develop a stochastic variational Expectation-Maximization algorithm to jointly optimize the neural and probabilistic components. Across multiple benchmark datasets, nnLDA consistently outperforms LDA and Dirichlet-Multinomial Regression in topic coherence, perplexity, and downstream classification. These results highlight the benefits of combining neural representation learning with probabilistic topic modeling in settings where side information is available.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2510.24918

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.77)

Add feedback

Enhancing Cloud Security through Topic Modelling

Saleh, Sabbir M., Madhavji, Nazim, Steinbacher, John

arXiv.org Artificial IntelligenceOct-21-2025

Protecting cloud applications is critical in an era where security threats are increasingly sophisticated and persistent. Continuous Integration and Continuous Deployment (CI/CD) pipelines are particularly vulnerable, making innovative security approaches essential. This research explores the application of Natural Language Processing (NLP) techniques, specifically Topic Modelling, to analyse security-related text data and anticipate potential threats. We focus on Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Analysis (PLSA) to extract meaningful patterns from data sources, including logs, reports, and deployment traces. Using the Gensim framework in Python, these methods categorise log entries into security-relevant topics (e.g., phishing, encryption failures). The identified topics are leveraged to highlight patterns indicative of security issues across CI/CD's continuous stages (build, test, deploy). This approach introduces a semantic layer that supports early vulnerability recognition and contextual understanding of runtime behaviours.

artificial intelligence, ci cd pipeline, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.01463

Country: North America > Canada > Ontario (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology: