AITopics | Ma, Chao

Collaborating Authors

Ma, Chao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo

Peis, Ignacio, Ma, Chao, Hernández-Lobato, José Miguel

arXiv.org Machine LearningFeb-9-2022

Variational Autoencoders (VAEs) have recently been highly successful at imputing and acquiring heterogeneous missing data and identifying outliers. However, within this specific application domain, existing VAE methods are restricted by using only one layer of latent variables and strictly Gaussian posterior approximations. To address these limitations, we present HH-VAEM, a Hierarchical VAE model for mixed-type incomplete data that uses Hamiltonian Monte Carlo with automatic hyper-parameter tuning for improved approximate inference. Our experiments show that HH-VAEM outperforms existing baselines in the tasks of missing data imputation, supervised learning and outlier identification with missing features. Finally, we also present a sampling-based approach for efficiently computing the information gain when missing features are to be acquired with HH-VAEM. Our experiments show that this sampling-based approach is superior to alternatives based on Gaussian approximations.

data mining, data quality, machine learning, (13 more...)

arXiv.org Machine Learning

2202.04599

Country:

Europe (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Deep End-to-end Causal Inference

Geffner, Tomas, Antoran, Javier, Foster, Adam, Gong, Wenbo, Ma, Chao, Kiciman, Emre, Sharma, Amit, Lamb, Angus, Kukla, Martin, Pawlowski, Nick, Allamanis, Miltiadis, Zhang, Cheng

arXiv.org Machine LearningFeb-4-2022

Causal inference is essential for data-driven decision making across domains such as business engagement, medical treatment or policy making. However, research on causal discovery and inference has evolved separately, and the combination of the two domains is not trivial. In this work, we develop Deep End-to-end Causal Inference (DECI), a single flow-based method that takes in observational data and can perform both causal discovery and inference, including conditional average treatment effect (CATE) estimation. We provide a theoretical guarantee that DECI can recover the ground truth causal graph under mild assumptions. In addition, our method can handle heterogeneous, real-world, mixed-type data with missing values, allowing for both continuous and discrete treatment decisions. Moreover, the design principle of our method can generalize beyond DECI, providing a general End-to-end Causal Inference (ECI) recipe, which enables different ECI frameworks to be built using existing methods. Our results show the superior performance of DECI when compared to relevant baselines for both causal discovery and (C)ATE estimation in over a thousand experiments on both synthetic datasets and other causal machine learning benchmark datasets.

artificial intelligence, health & medicine, machine learning, (19 more...)

arXiv.org Machine Learning

2202.02195

Country:

North America > United States > Massachusetts (0.14)
North America > Canada > British Columbia (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
(3 more...)

Add feedback

Identifiable Generative Models for Missing Not at Random Data Imputation

Ma, Chao, Zhang, Cheng

arXiv.org Machine LearningOct-27-2021

Real-world datasets often have missing values associated with complex generative processes, where the cause of the missingness may not be fully observed. This is known as missing not at random (MNAR) data. However, many imputation methods do not take into account the missingness mechanism, resulting in biased imputation values when MNAR data is present. Although there are a few methods that have considered the MNAR scenario, their model's identifiability under MNAR is generally not guaranteed. That is, model parameters can not be uniquely determined even with infinite data samples, hence the imputation results given by such models can still be biased. This issue is especially overlooked by many modern deep generative models. In this work, we fill in this gap by systematically analyzing the identifiability of generative models under MNAR. Furthermore, we propose a practical deep generative model which can provide identifiability guarantees under mild assumptions, for a wide range of MNAR mechanisms. Our method demonstrates a clear advantage for tasks on both synthetic data and multiple real-world scenarios with MNAR data.

identifiable generative model, random data imputation

arXiv.org Machine Learning

2110.14708

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.44)

Add feedback

On Perceptual Lossy Compression: The Cost of Perceptual Reconstruction and An Optimal Training Framework

Yan, Zeyu, Wen, Fei, Ying, Rendong, Ma, Chao, Liu, Peilin

arXiv.org Artificial IntelligenceJun-4-2021

Lossy compression algorithms are typically designed to achieve the lowest possible distortion at a given bit rate. However, recent studies show that pursuing high perceptual quality would lead to increase of the lowest achievable distortion (e.g., MSE). This paper provides nontrivial results theoretically revealing that, \textit{1}) the cost of achieving perfect perception quality is exactly a doubling of the lowest achievable MSE distortion, \textit{2}) an optimal encoder for the "classic" rate-distortion problem is also optimal for the perceptual compression problem, \textit{3}) distortion loss is unnecessary for training a perceptual decoder. Further, we propose a novel training framework to achieve the lowest MSE distortion under perfect perception constraint at a given bit rate. This framework uses a GAN with discriminator conditioned on an MSE-optimized encoder, which is superior over the traditional framework using distortion plus adversarial loss. Experiments are provided to verify the theoretical finding and demonstrate the superiority of the proposed training framework.

constraint, deep learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2106.02782

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Nonlinear Weighted Directed Acyclic Graph and A Priori Estimates for Neural Networks

Li, Yuqing, Luo, Tao, Ma, Chao

arXiv.org Machine LearningMar-30-2021

In an attempt to better understand structural benefits and generalization power of deep neural networks, we firstly present a novel graph theoretical formulation of neural network models, including fully connected, residual network~(ResNet) and densely connected networks~(DenseNet). Secondly, we extend the error analysis of the population risk for two layer network~\cite{ew2019prioriTwo} and ResNet~\cite{e2019prioriRes} to DenseNet, and show further that for neural networks satisfying certain mild conditions, similar estimates can be obtained. These estimates are a priori in nature since they depend sorely on the information prior to the training process, in particular, the bounds for the estimation errors are independent of the input dimension.

artificial intelligence, deep learning, neural network, (15 more...)

arXiv.org Machine Learning

2103.16355

Country: Europe > Italy (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning

Zhang, Weixia, Ma, Chao, Wu, Qi, Yang, Xiaokang

arXiv.org Artificial IntelligenceNov-22-2020

The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments according to the given language instruction. The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments; second, during the training process, the agent usually imitate the shortest-path to the target location. Due to the discrepancy of action selection between training and inference, the agent solely on the basis of imitation learning does not perform well. Sampling the next action from its predicted probability distribution during the training process allows the agent to explore diverse routes from the environments, yielding higher success rates. Nevertheless, without being presented with the shortest navigation paths during the training process, the agent may arrive at the target location through an unexpected longer route. To overcome these challenges, we design a cross-modal grounding module, which is composed of two complementary attention mechanisms, to equip the agent with a better ability to track the correspondence between the textual and visual modalities. We then propose to recursively alternate the learning schemes of imitation and exploration to narrow the discrepancy between training and inference. We further exploit the advantages of both these two learning schemes via adversarial learning. Extensive experimental results on the Room-to-Room (R2R) benchmark dataset demonstrate that the proposed learning scheme is generalized and complementary to prior arts. Our method performs well against state-of-the-art approaches in terms of effectiveness and efficiency.

agent, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TCSVT.2020.3039522

2011.10972

Country:

Asia > China (0.14)
Oceania > Australia (0.14)
North America (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning

Zhou, Pan, Feng, Jiashi, Ma, Chao, Xiong, Caiming, HOI, Steven, E, Weinan

arXiv.org Artificial IntelligenceOct-12-2020

It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this generalization gap by analyzing their local convergence behaviors. Specifically, we observe the heavy tails of gradient noise in these algorithms. This motivates us to analyze these algorithms through their Levy-driven stochastic differential equations (SDEs) because of the similar convergence behaviors of an algorithm and its SDE. Then we establish the escaping time of these SDEs from a local basin. The result shows that (1) the escaping time of both SGD and ADAM~depends on the Radon measure of the basin positively and the heaviness of gradient noise negatively; (2) for the same basin, SGD enjoys smaller escaping time than ADAM, mainly because (a) the geometry adaptation in ADAM~via adaptively scaling each gradient coordinate well diminishes the anisotropic structure in gradient noise and results in larger Radon measure of a basin; (b) the exponential gradient average in ADAM~smooths its gradient and leads to lighter gradient noise tails than SGD. So SGD is more locally unstable than ADAM~at sharp minima defined as the minima whose local basins have small Radon measure, and can better escape from them to flatter ones with larger Radon measure. As flat minima here which often refer to the minima at flat or asymmetric basins/valleys often generalize better than sharp ones~\cite{keskar2016large,he2019asymmetric}, our result explains the better generalization performance of SGD over ADAM. Finally, experimental results confirm our heavy-tailed gradient noise assumption and theoretical affirmation.

deep learning, minima, neural network, (15 more...)

arXiv.org Artificial Intelligence

2010.05627

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't

E, Weinan, Ma, Chao, Wojtowytsch, Stephan, Wu, Lei

arXiv.org Machine LearningOct-1-2020

The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine learning. In the tradition of good old applied mathematics, we will not only give attention to rigorous mathematical results, but also the insight we have gained from careful numerical experiments as well as the analysis of simplified models. Along the way, we also list the open problems which we believe to be the most important topics for further study. This is not a complete overview over this quickly moving field, but we hope to provide a perspective which may be helpful especially to new researchers in the area.

artificial intelligence, deep learning, neural network, (16 more...)

arXiv.org Machine Learning

2009.10713

Country: North America > United States (0.28)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Complexity Measures for Neural Networks with General Activation Functions Using Path-based Norms

Li, Zhong, Ma, Chao, Wu, Lei

arXiv.org Machine LearningSep-13-2020

A simple approach is proposed to obtain complexity controls for neural networks with general activation functions. The approach is motivated by approximating the general activation functions with one-dimensional ReLU networks, which reduces the problem to the complexity controls of ReLU networks. Specifically, we consider two-layer networks and deep residual networks, for which path-based norms are derived to control complexities. We also provide preliminary analyses of the function spaces induced by these norms and a priori estimates of the corresponding regularized estimators.

activation function, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

2009.06132

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

A Qualitative Study of the Dynamic Behavior of Adaptive Gradient Algorithms

Ma, Chao, Wu, Lei, E, Weinan

arXiv.org Machine LearningSep-13-2020

The dynamic behavior of RMSprop and Adam algorithms is studied through a combination of careful numerical experiments and theoretical explanations. Three types of qualitative features are observed in the training loss curve: fast initial convergence, oscillations and large spikes. The sign gradient descent (signGD) algorithm, which is the limit of Adam when taking the learning rate to $0$ while keeping the momentum parameters fixed, is used to explain the fast initial convergence. For the late phase of Adam, three different types of qualitative patterns are observed depending on the choice of the hyper-parameters: oscillations, spikes and divergence. In particular, Adam converges faster and smoother when the values of the two momentum factors are close to each other.

algorithm, artificial intelligence, neural network, (17 more...)

arXiv.org Machine Learning

2009.06125

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback