AITopics | document-topic distribution

Collaborating Authors

document-topic distribution

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

XTRA: Cross-Lingual Topic Modeling with Topic and Representation Alignments

Nguyen, Tien Phat, Ngo, Vu Minh, Nguyen, Tung, Van Ngo, Linh, Nguyen, Duc Anh, Dinh, Sang, Le, Trung

arXiv.org Artificial IntelligenceOct-6-2025

Cross-lingual topic modeling aims to uncover shared semantic themes across languages. Several methods have been proposed to address this problem, leveraging both traditional and neural approaches. While previous methods have achieved some improvements in topic diversity, they often struggle to ensure high topic coherence and consistent alignment across languages. We propose XTRA (Cross-Lingual Topic Modeling with Topic and Representation Alignments), a novel framework that unifies Bag-of-Words modeling with multilingual embeddings. XTRA introduces two core components: (1) representation alignment, aligning document-topic distributions via contrastive learning in a shared semantic space; and (2) topic alignment, projecting topic-word distributions into the same space to enforce crosslingual consistency. This dual mechanism enables XTRA to learn topics that are interpretable (coherent and diverse) and well-aligned across languages. Experiments on multilingual corpora confirm that XTRA significantly outperforms strong baselines in topic coherence, diversity, and alignment quality. Code and reproducible scripts are available at https: //github.com/tienphat140205/XTRA.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.02788

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.46)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DiffETM: Diffusion Process Enhanced Embedded Topic Model

Shao, Wei, Liu, Mingyang, Song, Linqi

arXiv.org Artificial IntelligenceJan-1-2025

The embedded topic model (ETM) is a widely used approach that assumes the sampled document-topic distribution conforms to the logistic normal distribution for easier optimization. However, this assumption oversimplifies the real document-topic distribution, limiting the model's performance. In response, we propose a novel method that introduces the diffusion process into the sampling process of document-topic distribution to overcome this limitation and maintain an easy optimization process. We validate our method through extensive experiments on two mainstream datasets, proving its effectiveness in improving topic modeling performance.

document-topic distribution, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.00862

Country: Asia > China (0.16)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.75)

Add feedback

Concentrated Document Topic Model

Lei, Hao, Chen, Ying

arXiv.org Machine LearningFeb-6-2021

We propose a Concentrated Document Topic Model(CDTM) for unsupervised text classification, which is able to produce a concentrated and sparse document topic distribution. In particular, an exponential entropy penalty is imposed on the document topic distribution. Documents that have diverse topic distributions are penalized more, while those having concentrated topics are penalized less. We apply the model to the benchmark NIPS dataset and observe more coherent topics and more concentrated and sparse document-topic distributions than Latent Dirichlet Allocation(LDA).

concentrated document topic model, document-topic distribution, entropy, (14 more...)

arXiv.org Machine Learning

2102.04449

Country:

Asia > Singapore (0.05)
Europe > United Kingdom (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Topic-Based Dissimilarity and Sensitivity Models for Translation Rule Selection

Zhang, M., Xiao, X., Xiong, D., Liu, Q.

Journal of Artificial Intelligence ResearchMay-8-2014

Translation rule selection is a task of selecting appropriate translation rules for an ambiguous source-language segment. As translation ambiguities are pervasive in statistical machine translation, we introduce two topic-based models for translation rule selection which incorporates global topic information into translation disambiguation. We associate each synchronous translation rule with source- and target-side topic distributions.With these topic distributions, we propose a topic dissimilarity model to select desirable (less dissimilar) rules by imposing penalties for rules with a large value of dissimilarity of their topic distributions to those of given documents. In order to encourage the use of non-topic specific translation rules, we also present a topic sensitivity model to balance translation rule selection between generic rules and topic-specific rules. Furthermore, we project target-side topic distributions onto the source-side topic model space so that we can benefit from topic information of both the source and target language. We integrate the proposed topic dissimilarity and sensitivity model into hierarchical phrase-based machine translation for synchronous translation rule selection. Experiments show that our topic-based translation rule selection model can substantially improve translation quality.

topic model, translation, translation rule, (12 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4265

AI Access Foundation

10878

Journal of Artificial Intelligence Research

Country:

Europe > Czechia > Prague (0.04)
Asia > South Korea (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(14 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback