AITopics | Vu, Mai Anh

Collaborating Authors

Vu, Mai Anh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Conditional expectation with regularization for missing data imputation

Vu, Mai Anh, Nguyen, Thu, Do, Tu T., Phan, Nhan, Chawla, Nitesh V., Halvorsen, Pål, Riegler, Michael A., Nguyen, Binh T.

arXiv.org Machine LearningSep-11-2023

Missing data frequently occurs in datasets across various domains, such as medicine, sports, and finance. In many cases, to enable proper and reliable analyses of such data, the missing values are often imputed, and it is necessary that the method used has a low root mean square error (RMSE) between the imputed and the true values. In addition, for some critical applications, it is also often a requirement that the imputation method is scalable and the logic behind the imputation is explainable, which is especially difficult for complex methods that are, for example, based on deep learning. Based on these considerations, we propose a new algorithm named "conditional Distribution-based Imputation of Missing Values with Regularization" (DIMV). DIMV operates by determining the conditional distribution of a feature that has missing entries, using the information from the fully observed features as a basis. As will be illustrated via experiments in the paper, DIMV (i) gives a low RMSE for the imputed values compared to state-of-the-art methods; (ii) fast and scalable; (iii) is explainable as coefficients in a regression model, allowing reliable and trustable analysis, makes it a suitable choice for critical domains where understanding is important such as in medical fields, finance, etc; (iv) can provide an approximated confidence region for the missing values in a given sample; (v) suitable for both small and large scale data; (vi) in many scenarios, does not require a huge number of parameters as deep learning approaches; (vii) handle multicollinearity in imputation effectively; and (viii) is robust to the normally distributed assumption that its theoretical grounds rely on.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2302.00911

Country: North America > United States > Indiana (0.14)

Genre: Research Report > Promising Solution (0.66)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Correlation visualization under missing values: a comparison between imputation and direct parameter estimation methods

Pham, Nhat-Hao, Vo, Khanh-Linh, Vu, Mai Anh, Nguyen, Thu, Riegler, Michael A., Halvorsen, Pål, Nguyen, Binh T.

arXiv.org Machine LearningSep-5-2023

Correlation matrix visualization is essential for understanding the relationships between variables in a dataset, but missing data can seriously affect this important data visualization tool. In this paper, we compare the effects of various missing data methods on the correlation plot, focusing on two randomly missing data and monotone missing data. We aim to provide practical strategies and recommendations for researchers and practitioners in creating and analyzing the correlation plot under missing data. Our experimental results suggest that while imputation is commonly used for missing data, using imputed data for plotting the correlation matrix may lead to a significantly misleading inference of the relation between the features. In addition, the most accurate technique for computing a correlation matrix (in terms of RMSE) does not always give the correlation plots that most resemble the one based on complete data (the ground truth). We recommend using DPER [1], a direct parameter estimation approach, for plotting the correlation matrix based on its performance in the experiments.

artificial intelligence, data quality, machine learning, (15 more...)

arXiv.org Machine Learning

2305.06044

Country:

Asia > Vietnam (0.15)
Europe > Norway (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Blockwise Principal Component Analysis for monotone missing data imputation and dimensionality reduction

Do, Tu T., Vu, Mai Anh, Ly, Hoang Thien, Nguyen, Thu, Hicks, Steven A., Riegler, Michael A., Halvorsen, Pål, Nguyen, Binh T.

arXiv.org Artificial IntelligenceMay-10-2023

Monotone missing data is a common problem in data analysis. However, imputation combined with dimensionality reduction can be computationally expensive, especially with the increasing size of datasets. To address this issue, we propose a Blockwise principal component analysis Imputation (BPI) framework for dimensionality reduction and imputation of monotone missing data. The framework conducts Principal Component Analysis (PCA) on the observed part of each monotone block of the data and then imputes on merging the obtained principal components using a chosen imputation technique. BPI can work with various imputation techniques and can significantly reduce imputation time compared to conducting dimensionality reduction after imputation. This makes it a practical and efficient approach for large datasets with monotone missing data. Our experiments validate the improvement in speed. In addition, our experiments also show that while applying MICE imputation directly on missing data may not yield convergence, applying BPI with MICE for the data may lead to convergence.

data quality, imputation, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.06042

Country:

Europe (0.46)
Asia (0.29)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.81)

Add feedback

Traffic Density Estimation using a Convolutional Neural Network

Nubert, Julian, Truong, Nicholas Giai, Lim, Abel, Tanujaya, Herbert Ilhan, Lim, Leah, Vu, Mai Anh

arXiv.org Artificial IntelligenceSep-5-2018

The goal of this project is to introduce and present a machine learning application that aims to improve the quality of life of people in Singapore. In particular, we investigate the use of machine learning solutions to tackle the problem of traffic congestion in Singapore. In layman's terms, we seek to make Singapore (or any other city) a smoother place. To accomplish this aim, we present an end-to-end system comprising of 1. A traffic density estimation algorithm at traffic lights/junctions and 2. a suitable traffic signal control algorithms that make use of the density information for better traffic control. Traffic density estimation can be obtained from traffic junction images using various machine learning techniques (combined with CV tools). After research into various advanced machine learning methods, we decided on convolutional neural networks (CNNs). We conducted experiments on our algorithms, using the publicly available traffic camera dataset published by the Land Transport Authority (LTA) to demonstrate the feasibility of this approach. With these traffic density estimates, different traffic algorithms can be applied to minimize congestion at traffic junctions in general.

application, deep learning, neural network, (22 more...)

arXiv.org Artificial Intelligence

1809.01564

Country: Asia > Singapore (0.66)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (0.89)
Transportation > Infrastructure & Services (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback