AITopics | topic assignment

Collaborating Authors

topic assignment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Topic Modeling Revisited: A Document Graph-based Neural Network Perspective

Neural Information Processing SystemsFeb-9-2026, 11:55:17 GMT

Meanwhile, a Neural V ariational Inference (NVI) approach is proposed to learn our model with graph neural networks to encode the document graphs. Besides, we theoretically demonstrate that Latent Dirichlet Allocation (LDA) can be derived from GNTM as a special case with similar objective functions.

machine learning, natural language, topic model, (20 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Industry: Leisure & Entertainment > Sports > Hockey (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ClusterFusion: Hybrid Clustering with Embedding Guidance and LLM Adaptation

Xu, Yiming, Yuan, Yuan, Viswanathan, Vijay, Neubig, Graham

arXiv.org Artificial IntelligenceDec-5-2025

Text clustering is a fundamental task in natural language processing, yet traditional clustering algorithms with pre-trained embeddings often struggle in domain-specific contexts without costly fine-tuning. Large language models (LLMs) provide strong contextual reasoning, yet prior work mainly uses them as auxiliary modules to refine embeddings or adjust cluster boundaries. We propose ClusterFusion, a hybrid framework that instead treats the LLM as the clustering core, guided by lightweight embedding methods. The framework proceeds in three stages: embedding-guided subset partition, LLM-driven topic summarization, and LLM-based topic assignment. This design enables direct incorporation of domain knowledge and user preferences, fully leveraging the contextual adaptability of LLMs. Experiments on three public benchmarks and two new domain-specific datasets demonstrate that ClusterFusion not only achieves state-of-the-art performance on standard tasks but also delivers substantial gains in specialized domains. To support future work, we release our newly constructed dataset and results on all benchmarks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.0435

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

9f1d5659d5880fb427f6e04ae500fc25-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 11:44:28 GMT

computational linguistic, topic model, variational inference, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland > Baltimore (0.14)
North America > United States > Maryland > Baltimore County (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(7 more...)

Genre: Research Report (0.67)

Industry:

Media > Film (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

7b6982e584636e6a1cda934f1410299c-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 08:49:32 GMT

dependency, latexit sha1, topic model, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Industry: Leisure & Entertainment > Sports > Hockey (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Improving customer service with automatic topic detection in user emails

Bašaragin, Bojana, Medvecki, Darija, Gojić, Gorana, Oparnica, Milena, Mišković, Dragiša

arXiv.org Artificial IntelligenceFeb-26-2025

This study introduces a novel Natural Language Processing pipeline that enhances customer service efficiency at Telekom Srbija, a leading Serbian telecommunications company, through automated email topic detection and labelling. Central to the pipeline is BERTopic, a modular architecture that allows unsupervised topic modelling. After a series of preprocessing and post-processing steps, we assign one of 12 topics and several additional labels to incoming emails, allowing customer service to filter and access them through a custom-made application. The model's performance was evaluated by assessing the speed and correctness of the automatically assigned topics across a test dataset of 100 customer emails. The pipeline shows broad applicability across languages, particularly for those that are low-resourced and morphologically rich. The system now operates in the company's production environment, streamlining customer service operations through automated email classification.

bertopic, email, representation, (15 more...)

arXiv.org Artificial Intelligence

2502.19115

Country:

North America > United States > Hawaii (0.04)
Europe > Switzerland (0.04)
Europe > Serbia > Šumadija and Western Serbia > Raška District > Novi Pazar (0.04)
(3 more...)

Genre: Research Report (0.83)

Industry: Telecommunications (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.70)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

LITA: An Efficient LLM-assisted Iterative Topic Augmentation Framework

Chang, Chia-Hsuan, Tsai, Jui-Tse, Tsai, Yi-Hang, Hwang, San-Yih

arXiv.org Artificial IntelligenceDec-16-2024

Topic modeling is widely used for uncovering thematic structures within text corpora, yet traditional models often struggle with specificity and coherence in domain-focused applications. Guided approaches, such as SeededLDA and CorEx, incorporate user-provided seed words to improve relevance but remain labor-intensive and static. Large language models (LLMs) offer potential for dynamic topic refinement and discovery, yet their application often incurs high API costs. To address these challenges, we propose the LLM-assisted Iterative Topic Augmentation framework (LITA), an LLM-assisted approach that integrates user-provided seeds with embedding-based clustering and iterative refinement. LITA identifies a small number of ambiguous documents and employs an LLM to reassign them to existing or new topics, minimizing API costs while enhancing topic quality. Experiments on two datasets across topic quality and clustering performance metrics demonstrate that LITA outperforms five baseline models, including LDA, SeededLDA, CorEx, BERTopic, and PromptTopic. Our work offers an efficient and adaptable framework for advancing topic modeling and text clustering.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.12459

Country:

North America > United States > Connecticut > New Haven County > New Haven (0.04)
Asia > Taiwan > Takao Province > Kaohsiung (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)

Add feedback

MixEHR-Nest: Identifying Subphenotypes within Electronic Health Records through Hierarchical Guided-Topic Modeling

Wang, Ruohan, Wang, Zilong, Song, Ziyang, Buckeridge, David, Li, Yue

arXiv.org Artificial IntelligenceOct-17-2024

Automatic subphenotyping from electronic health records (EHRs)provides numerous opportunities to understand diseases with unique subgroups and enhance personalized medicine for patients. However, existing machine learning algorithms either focus on specific diseases for better interpretability or produce coarse-grained phenotype topics without considering nuanced disease patterns. In this study, we propose a guided topic model, MixEHR-Nest, to infer sub-phenotype topics from thousands of disease using multi-modal EHR data. Specifically, MixEHR-Nest detects multiple subtopics from each phenotype topic, whose prior is guided by the expert-curated phenotype concepts such as Phenotype Codes (PheCodes) or Clinical Classification Software (CCS) codes. We evaluated MixEHR-Nest on two EHR datasets: (1) the MIMIC-III dataset consisting of over 38 thousand patients from intensive care unit (ICU) from Beth Israel Deaconess Medical Center (BIDMC) in Boston, USA; (2) the healthcare administrative database PopHR, comprising 1.3 million patients from Montreal, Canada. Experimental results demonstrate that MixEHR-Nest can identify subphenotypes with distinct patterns within each phenotype, which are predictive for disease progression and severity. Consequently, MixEHR-Nest distinguishes between type 1 and type 2 diabetes by inferring subphenotypes using CCS codes, which do not differentiate these two subtype concepts. Additionally, MixEHR-Nest not only improved the prediction accuracy of short-term mortality of ICU patients and initial insulin treatment in diabetic patients but also revealed the contributions of subphenotypes. For longitudinal analysis, MixEHR-Nest identified subphenotypes of distinct age prevalence under the same phenotypes, such as asthma, leukemia, epilepsy, and depression. The MixEHR-Nest software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-Nest.

machine learning, natural language, subphenotype, (18 more...)

arXiv.org Artificial Intelligence

2410.13217

Country:

North America > Canada > Quebec > Montreal (0.67)
Asia > Middle East > Israel (0.24)
Asia > China > Guangdong Province > Shenzhen (0.05)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Hematology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.35)

Add feedback

Interactive Topic Models with Optimal Transport

Dhanania, Garima, Mysore, Sheshera, Pham, Chau Minh, Iyyer, Mohit, Zamani, Hamed, McCallum, Andrew

arXiv.org Artificial IntelligenceJun-28-2024

Topic models are widely used to analyze document collections. While they are valuable for discovering latent topics in a corpus when analysts are unfamiliar with the corpus, analysts also commonly start with an understanding of the content present in a corpus. This may be through categories obtained from an initial pass over the corpus or a desire to analyze the corpus through a predefined set of categories derived from a high level theoretical framework (e.g. political ideology). In these scenarios analysts desire a topic modeling approach which incorporates their understanding of the corpus while supporting various forms of interaction with the model. In this work, we present EdTM, as an approach for label name supervised topic modeling. EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities and using optimal transport for making globally coherent topic-assignments. In experiments, we show the efficacy of our framework compared to few-shot LLM classifiers, and topic models based on clustering and LDA. Further, we show EdTM's ability to incorporate various forms of analyst feedback and while remaining robust to noisy analyst inputs.

assignment, topic assignment, topic modeling, (14 more...)

arXiv.org Artificial Intelligence

2406.19928

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York > New York County > New York City (0.05)
Asia > Middle East > Jordan (0.05)
(12 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)

Add feedback

MixEHR-SurG: a joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records

Li, Yixuan, Marelli, Ariane, Yang, Archer Y., Li, Yue

arXiv.org Artificial IntelligenceDec-20-2023

Objective: To improve survival analysis using EHR data, we aim to develop a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Materials and Methods: Our technical contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) inferring patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival and guided topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-G using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8,211 subjects with 75,187 outpatient claim data of 1,767 unique ICD codes; the MIMIC-III consisting of 1,458 subjects with multi-modal EHR records. Results: Compared to the baselines, MixEHR-G achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-G associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge. Conclusion: The integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG led to not only competitive mortality prediction but also meaningful phenotype topics for systematic survival analysis. The software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-SurG.

mixehr-surg, phenotype, survival time, (15 more...)

arXiv.org Artificial Intelligence

2312.13454

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.94)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.91)

Add feedback

TopicGPT: A Prompt-based Topic Modeling Framework

Pham, Chau Minh, Hoyle, Alexander, Sun, Simeng, Iyyer, Mohit

arXiv.org Artificial IntelligenceNov-2-2023

Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users minimal semantic control over topics. To tackle these issues, we introduce TopicGPT, a prompt-based framework that uses large language models (LLMs) to uncover latent topics within a provided text collection. TopicGPT produces topics that align better with human categorizations compared to competing methods: for example, it achieves a harmonic mean purity of 0.74 against human-annotated Wikipedia topics compared to 0.64 for the strongest baseline. Its topics are also more interpretable, dispensing with ambiguous bags of words in favor of topics with natural language labels and associated free-form descriptions. Moreover, the framework is highly adaptable, allowing users to specify constraints and modify topics without the need for model retraining. TopicGPT can be further extended to hierarchical topical modeling, enabling users to explore topics at various levels of granularity. By streamlining access to high-quality and interpretable topics, TopicGPT represents a compelling, human-centered approach to topic modeling.

assignment, dataset, topicgpt, (15 more...)

arXiv.org Artificial Intelligence

2311.01449

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Minnesota (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry:

Transportation (1.00)
Law (1.00)
Consumer Products & Services > Restaurants (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback