topic inference
- Asia > Middle East > Jordan (0.05)
- North America > United States > Michigan (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Geometric Dirichlet Means Algorithm for topic inference
We propose a geometric algorithm for topic learning and inference that is built on the convex geometry of topics arising from the Latent Dirichlet Allocation (LDA) model and its nonparametric extensions. To this end we study the optimization of a geometric loss function, which is a surrogate to the LDA's likelihood. Our method involves a fast optimization based weighted clustering procedure augmented with geometric corrections, which overcomes the computational and statistical inefficiencies encountered by other techniques based on Gibbs sampling and variational inference, while achieving the accuracy comparable to that of a Gibbs sampler. The topic estimates produced by our method are shown to be statistically consistent under some conditions. The algorithm is evaluated with extensive experiments on simulated and real data.
Reviews: Geometric Dirichlet Means Algorithm for topic inference
I like this paper for two different reasons. After RecoverKL and the spectral algorithm, this paper brings a very novel and useful perspective into the topic inference problem for LDA, without apparently making strong assumptions about topics, such as separability via anchor words, etc. Secondly, it seems to be extremely good in practice meeting the speed of RecoverKL with the accuracy of Gibbs sampling algorithms. A. The algorithm: Aspects of this work were known before. For example, Blei pointed out the convex geometry in the original LDA paper, and the connection between LDA/NMF and K-Means was also known. However, the novel aspect of this paper is that it has used these connections to propose an inference algorithm for LDA completely based on the geometry of the topic and word simplexes. This is done by making an additional connection between the topic inference problem and that of Centroidal Voronoi Tesselations of a convex simplex.
MixEHR-Nest: Identifying Subphenotypes within Electronic Health Records through Hierarchical Guided-Topic Modeling
Wang, Ruohan, Wang, Zilong, Song, Ziyang, Buckeridge, David, Li, Yue
Automatic subphenotyping from electronic health records (EHRs)provides numerous opportunities to understand diseases with unique subgroups and enhance personalized medicine for patients. However, existing machine learning algorithms either focus on specific diseases for better interpretability or produce coarse-grained phenotype topics without considering nuanced disease patterns. In this study, we propose a guided topic model, MixEHR-Nest, to infer sub-phenotype topics from thousands of disease using multi-modal EHR data. Specifically, MixEHR-Nest detects multiple subtopics from each phenotype topic, whose prior is guided by the expert-curated phenotype concepts such as Phenotype Codes (PheCodes) or Clinical Classification Software (CCS) codes. We evaluated MixEHR-Nest on two EHR datasets: (1) the MIMIC-III dataset consisting of over 38 thousand patients from intensive care unit (ICU) from Beth Israel Deaconess Medical Center (BIDMC) in Boston, USA; (2) the healthcare administrative database PopHR, comprising 1.3 million patients from Montreal, Canada. Experimental results demonstrate that MixEHR-Nest can identify subphenotypes with distinct patterns within each phenotype, which are predictive for disease progression and severity. Consequently, MixEHR-Nest distinguishes between type 1 and type 2 diabetes by inferring subphenotypes using CCS codes, which do not differentiate these two subtype concepts. Additionally, MixEHR-Nest not only improved the prediction accuracy of short-term mortality of ICU patients and initial insulin treatment in diabetic patients but also revealed the contributions of subphenotypes. For longitudinal analysis, MixEHR-Nest identified subphenotypes of distinct age prevalence under the same phenotypes, such as asthma, leukemia, epilepsy, and depression. The MixEHR-Nest software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-Nest.
- North America > Canada > Quebec > Montreal (0.67)
- Asia > Middle East > Israel (0.24)
- Asia > China > Guangdong Province > Shenzhen (0.05)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.68)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Hematology (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- (2 more...)
Zero-Shot Multi-Label Topic Inference with Sentence Encoders
Sarkar, Souvika, Feng, Dongji, Santu, Shubhra Kanti Karmaker
In this paper, we focus on Zero-shot approaches 2018b)] for topic inference tasks and subsequently, (Yin et al., 2019; Xie et al., 2016; Veeranna establish a benchmark for future study in this crucial et al., 2016) for inferring topics from documents direction. To achieve this, we conducted extensive where document and topics were never seen experiments with multiple real-world datasets, previously by a model. Furthermore, for developing including online product reviews, news articles, Zero-shot methods, we exclusively focus on and health-related blog articles. We also implemented leveraging the recent powerful sentence encoders.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- (7 more...)
Geometric Dirichlet Means Algorithm for topic inference
Yurochkin, Mikhail, Nguyen, XuanLong
We propose a geometric algorithm for topic learning and inference that is built on the convex geometry of topics arising from the Latent Dirichlet Allocation (LDA) model and its nonparametric extensions. To this end we study the optimization of a geometric loss function, which is a surrogate to the LDA's likelihood. Our method involves a fast optimization based weighted clustering procedure augmented with geometric corrections, which overcomes the computational and statistical inefficiencies encountered by other techniques based on Gibbs sampling and variational inference, while achieving the accuracy comparable to that of a Gibbs sampler. The topic estimates produced by our method are shown to be statistically consistent under some conditions. The algorithm is evaluated with extensive experiments on simulated and real data.