AITopics | Do, Linh

Collaborating Authors

Do, Linh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dendrogram of mixing measures: Hierarchical clustering and model selection for finite mixture models

Do, Dat, Do, Linh, McKinley, Scott A., Terhorst, Jonathan, Nguyen, XuanLong

arXiv.org Machine LearningMar-8-2024

In modern data analysis, it is often useful to reduce the complexity of a large dataset by clustering the observations into a small and interpretable collection of subpopulations. Broadly speaking, there are two major approaches. In "model-based" clustering, the data are assumed to be generated by a (usually small) collection of simple probability distributions such as normal distributions, and clusters are inferred by fitting a probabilistic mixture model. Because of their transparent probabilistic assumptions, the statistical properties of mixture models are well-understood. In particular, if there is no model misspecification, i.e., the data truly come from a mixture distribution, then the subpopulations can be consistently estimated. Unfortunately, this appealing asymptotic guarantee is somewhat at odds with what is often observed in practice, whereby mixture models fitted to complex datasets often return an uninterpretably large number of components, many of which are quite similar to each other. The tendency of mixture models to overfit on real data leads many analysts to employ "model-free" clustering methods instead. A well-known example is hierarchical clustering, which organizes the data into a nested sequence of partitions at different resolutions. It is particularly useful for data exploration as it does not require fixing a number of subpopulations a priori and can be visualized using a dendrogram.

artificial intelligence, dendrogram, machine learning, (16 more...)

arXiv.org Machine Learning

2403.01684

Country:

Europe > United Kingdom > England (0.14)
North America > United States > Michigan (0.14)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Strong identifiability and parameter learning in regression with heterogeneous response

Do, Dat, Do, Linh, Nguyen, XuanLong

arXiv.org Machine LearningDec-8-2022

Regression is often associated with the task of curve fitting -- given data samples for pairs of random variables (X, Y), find a function y = F (x) that captures the relationship between X and Y as well as possible. As the underlying population for the (X, Y) pairs becomes increasingly complex, much efforts have been devoted to learning more complex models for the (regression) function F; see [20, 49, 15] for some recent examples. In many data domains, however, due to the heterogeneity of the behavior of the response variable Y with respect to covariate X, no single function F can fit the data pairs well, no matter how complex F is. Many authors noticed this challenge and adopted a mixture modeling framework into the regression problem, starting with some earlier work of [51, 6, 14]. To capture the uncertain and highly heterogeneous behavior of response variable Y given covariate X, one needs more than one single regression model. Suppose that there are k different regression behaviors, one can represent the conditional distribution of Y given X by a mixture of k conditional density functions associated with k underlying (latent) subpopulations. One can draw from the existing modeling tools of conditional densities such as generalized linear models [39], or more complex components [28, 63, 22] to increase the model fitness for the regression task. Recently, mixture of regression models (alternatively, regression mixture models) have found their applications in a vast range of domains, including risk estimation [2], education [7], medicine [34, 43, 56] and transportation analysis [46, 47, 64]. Making inferences in mixture of regression models can be done in a classical frequentist framework (e.g., maximum conditional likelihood estimation [6]), or a Bayesian framework [27].

artificial intelligence, machine learning, regression model, (18 more...)

arXiv.org Machine Learning

2212.04091

Country:

North America > Canada (0.27)
North America > United States > Michigan (0.14)

Genre: Research Report > Experimental Study (0.67)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback