AITopics | latent dirichlet allocation

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

arXiv.org Artificial IntelligenceDec-2-2025

Text Mining Analysis of Symptom Patterns in Medical Chatbot Conversations

Razavi, Hamed

The fast growth of digital health systems has led to a need to better comprehend how they interpret and represent patient-reported symptoms. Chatbots have been used in healthcare to provide clinical support and enhance the user experience, making it possible to provide meaningful clinical patterns from text-based data through chatbots. The proposed research utilises several different natural language processing methods to study the occurrences of symptom descriptions in medicine as well as analyse the patterns that emerge through these conversations within medical bots. Through the use of the Medical Conversations to Disease Dataset which contains 960 multi-turn dialogues divided into 24 Clinical Conditions, a standardised representation of conversations between patient and bot is created for further analysis by computational means. The multi-method approach uses a variety of tools, including Latent Dirichlet Allocation (LDA) to identify latent symptom themes, K-Means to group symptom descriptions by similarity, Transformer-based Named Entity Recognition (NER) to extract medical concepts, and the Apriori algorithm to discover frequent symptom pairs. Findings from the analysis indicate a coherent structure of clinically relevant topics, moderate levels of clustering cohesiveness and several high confidence rates on the relationships between symptoms like fever headache and rash itchiness. The results support the notion that conversational medical data can be a valuable diagnostic signal for early symptom interpretation, assist in strengthening decision support and improve how users interact with tele-health technology. By demonstrating a method for converting unstructured free-flowing dialogue into actionable knowledge regarding symptoms this work provides an extensible framework to further enhance future performance, dependability and clinical utility of selecting medical chatbots.

artificial intelligence, machine learning, natural language, (16 more...)

2512.00768

Country: North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)
Health & Medicine > Health Care Technology > Telehealth (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Wei-Shou Hsu, Pascal Poupart

Online Bayesian Moment Matching for Topic Modeling with Unknown Number of Topics

Neural Information Processing SystemsNov-20-2025, 21:35:44 GMT

Latent Dirichlet Allocation (LDA) is a very popular model for topic modeling as well as many other problems with latent groups. It is both simple and effective. When the number of topics (or latent groups) is unknown, the Hierarchical Dirichlet Process (HDP) provides an elegant non-parametric extension; however, it is a complex model and it is difficult to incorporate prior knowledge since the distribution over topics is implicit. We propose two new models that extend LDA in a simple and intuitive fashion by directly expressing a distribution over the number of topics. We also propose a new online Bayesian moment matching technique to learn the parameters and the number of topics of those models based on streaming data. The approach achieves higher log-likelihood than batch and online HDP with fixed hyperparameters on several corpora. The code is publicly available at https://github.com/whsu/bmm .

machine learning, natural language, posterior, (17 more...)

Country:

Asia > Middle East > Jordan (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Wei-Shou Hsu, Pascal Poupart

Online Bayesian Moment Matching for Topic Modeling with Unknown Number of Topics

Neural Information Processing SystemsNov-20-2025, 13:47:41 GMT

Neural Information Processing Systems http://nips.cc/

machine learning, natural language, posterior, (17 more...)

Country:

Asia > Middle East > Jordan (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Magsarjav, Saranzaya, Humphries, Melissa, Tuke, Jonathan, Mitchell, Lewis

Quantifying consistency and accuracy of Latent Dirichlet Allocation

arXiv.org Artificial IntelligenceNov-18-2025

Topic modelling in Natural Language Processing uncovers hidden topics in large, unlabelled text datasets. It is widely applied in fields such as information retrieval, content summarisation, and trend analysis across various disciplines. However, probabilistic topic models can produce different results when rerun due to their stochastic nature, leading to inconsistencies in latent topics. Factors like corpus shuffling, rare text removal, and document elimination contribute to these variations. This instability affects replicability, reliability, and interpretation, raising concerns about whether topic models capture meaningful topics or just noise. To address these problems, we defined a new stability measure that incorporates accuracy and consistency and uses the generative properties of LDA to generate a new corpus with ground truth. These generated corpora are run through LDA 50 times to determine the variability in the output. We show that LDA can correctly determine the underlying number of topics in the documents. We also find that LDA is more internally consistent, as the multiple reruns return similar topics; however, these topics are not the true topics.

artificial intelligence, natural language, similarity measure, (16 more...)

2511.1285

Country:

Oceania > Australia > South Australia > Adelaide (0.04)
Europe > Middle East > Malta > Port Region > Southern Harbour District > Floriana (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Neural Information Processing SystemsOct-2-2025, 11:40:51 GMT

9be40cee5b0eee1462c82c6964087ff9-Paper.pdf

artificial intelligence, machine learning, natural language, (15 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Bukhari, Syed Ahmad Chan, Keshtkar, Fazel, Meczkowska, Alyssa

A Narrative-Driven Computational Framework for Clinician Burnout Surveillance

arXiv.org Artificial IntelligenceSep-8-2025

Clinician burnout poses a substantial threat to patient safety, particularly in high-acuity intensive care units (ICUs). Existing research predominantly relies on retrospective survey tools or broad electronic health record (EHR) metadata, often overlooking the valuable narrative information embedded in clinical notes. In this study, we analyze 10,000 ICU discharge summaries from MIMIC-IV, a publicly available database derived from the electronic health records of Beth Israel Deaconess Medical Center. The dataset encompasses diverse patient data, including vital signs, medical orders, diagnoses, procedures, treatments, and deidentified free-text clinical notes. We introduce a hybrid pipeline that combines BioBERT sentiment embeddings fine-tuned for clinical narratives, a lexical stress lexicon tailored for clinician burnout surveillance, and five-topic latent Dirichlet allocation (LDA) with workload proxies. A provider-level logistic regression classifier achieves a precision of 0.80, a recall of 0.89, and an F1 score of 0.84 on a stratified hold-out set, surpassing metadata-only baselines by greater than or equal to 0.17 F1 score. Specialty-specific analysis indicates elevated burnout risk among providers in Radiology, Psychiatry, and Neurology. Our findings demonstrate that ICU clinical narratives contain actionable signals for proactive well-being monitoring.

artificial intelligence, machine learning, natural language, (17 more...)

2509.04497

Country:

Asia > Middle East > Israel (0.24)
North America > United States > New York > Queens County > New York City (0.05)
North America > United States > Hawaii (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Gong, Xian, McCarthy, Paul X., Tian, Lin, Rizoiu, Marian-Andrei

Signals from the Floods: AI-Driven Disaster Analysis through Multi-Source Data Fusion

arXiv.org Artificial IntelligenceMay-26-2025

Massive and diverse web data are increasingly vital for government disaster response, as demonstrated by the 2022 floods in New South Wales (NSW), Australia. This study examines how X (formerly Twitter) and public inquiry submissions provide insights into public behaviour during crises. We analyse more than 55,000 flood-related tweets and 1,450 submissions to identify behavioural patterns during extreme weather events. While social media posts are short and fragmented, inquiry submissions are detailed, multi-page documents offering structured insights. Our methodology integrates Latent Dirichlet Allocation (LDA) for topic modelling with Large Language Models (LLMs) to enhance semantic understanding. LDA reveals distinct opinions and geographic patterns, while LLMs improve filtering by identifying flood-relevant tweets using public submissions as a reference. This Relevance Index method reduces noise and prioritizes actionable content, improving situ-ational awareness for emergency responders. By combining these complementary data streams, our approach introduces a novel AI-driven method to refine crisis-related social media content, improve real-time disaster response, and inform long-term resilience planning.

large language model, natural language, submission, (17 more...)

2505.17038

Country:

Oceania > Australia > Queensland (0.05)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.05)
Europe > Germany (0.05)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.68)
Information Technology > Services (0.68)
Government > Regional Government > Oceania Government > Australia Government (0.31)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)

Wang, Ziang, Aryani, Amir

Technical Report on classification of literature related to children speech disorder

arXiv.org Artificial IntelligenceMay-21-2025

This technical report presents a natural language processing (NLP)-based approach for systematically classifying scientific literature on childhood speech disorders. We retrieved and filtered 4,804 relevant articles published after 2015 from the PubMed database using domain-specific keywords. After cleaning and pre-processing the abstracts, we applied two topic modeling techniques - Latent Dirichlet Allocation (LDA) and BERTopic - to identify latent thematic structures in the corpus. Our models uncovered 14 clinically meaningful clusters, such as infantile hyperactivity and abnormal epileptic behavior. To improve relevance and precision, we incorporated a custom stop word list tailored to speech pathology. Evaluation results showed that the LDA model achieved a coherence score of 0.42 and a perplexity of -7.5, indicating strong topic coherence and predictive performance. The BERTopic model exhibited a low proportion of outlier topics (less than 20%), demonstrating its capacity to classify heterogeneous literature effectively. These results provide a foundation for automating literature reviews in speech-language pathology.

artificial intelligence, natural language, text processing, (16 more...)

2505.14242

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.47)
Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

arXiv.org Artificial IntelligenceMar-4-2025

ttta: Tools for Temporal Text Analysis

Lange, Kai-Robin, Benner, Niklas, Grönberg, Lars, Hachcham, Aymane, Kolli, Imene, Rieger, Jonas, Jentsch, Carsten

In its current state, the ttta package includes diachronic embeddings, dynamic topic modeling, and document scaling. These tools can be used to track changes in language use, identify emerging topics, and explore how the meaning of words and phrases has evolved over time. Our dynamic topic model approach is based on the model RollingLDA (Rieger et al., 2021), which is a modification of the classic Latent Dirichlet Allocation (Blei et al., 2003), that allows for the estimation of topics over time using a rolling window approach. We additionally implemented the model LDAPrototype (Rieger et al., 2020), serving as a more consistent foundation for RollingLDA than a common LDA. With these models, users can uncover and analyze topics of discussion in temporal data sets and track even rapid changes, which other dynamic topic models struggle with. This ability to track rapid changes in topics is further used in the Topical Changes model put forth by Rieger et al. (2022) and Lange et al. (2022) that identifies change points in the word topic distribution of RollingLDA. Figure 1 visualizes the changes observed by the Topical Changes model in speeches from the German Bundestag (Lange & Jentsch, 2023), which can be analyzed further using leave-one-out word impacts provided by the model or, as Lange et al. (2025) proposed, by asking Large Language Models to interpret the change and relate it to a possible narrative shift.

artificial intelligence, natural language, text processing, (13 more...)

2503.02625

Country:

Europe > Ukraine (0.07)
Europe > Switzerland > Zürich > Zürich (0.05)
Europe > Russia (0.05)
(3 more...)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.79)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.77)