AITopics | Wu, Chenwei

Collaborating Authors

Wu, Chenwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Understanding the Data Dependency of Mixup-style Training

Chidambaram, Muthu, Wang, Xiang, Hu, Yuzheng, Wu, Chenwei, Ge, Rong

arXiv.org Artificial IntelligenceOct-14-2021

In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels. Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk and exhibit better generalization and robustness on various tasks when compared to standard training. In this paper, we investigate how these benefits of Mixup training rely on properties of the data in the context of classification. For minimizing the original empirical risk, we compute a closed form for the Mixup-optimal classification, which allows us to construct a simple dataset on which minimizing the Mixup loss can provably lead to learning a classifier that does not minimize the empirical loss on the data. On the other hand, we also give sufficient conditions for Mixup training to also minimize the original empirical risk. For generalization, we characterize the margin of a Mixup classifier, and use this to understand why the decision boundary of a Mixup classifier can adapt better to the full structure of the training data when compared to standard training. In contrast, we also show that, for a large class of linear models and linearly separable datasets, Mixup training leads to learning the same classifier as standard training.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2110.07647

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

Wu, Yikai, Zhu, Xingyu, Wu, Chenwei, Wang, Annie, Ge, Rong

arXiv.org Machine LearningOct-30-2020

Hessian captures important properties of the deep neural network loss landscape. We observe that eigenvectors and eigenspaces of the layer-wise Hessian for neural network objective have several interesting structures -- top eigenspaces for different models have high overlap, and top eigenvectors form low rank matrices when they are reshaped into the same shape as the corresponding weight matrix. These structures, as well as the low rank structure of the Hessian observed in previous studies, can be explained by approximating the Hessian using Kronecker factorization. Our new understanding can also explain why some of these structures become weaker when the network is trained with batch normalization. Finally, we show that the Kronecker factorization can be combined with PAC-Bayes techniques to get better explicit generalization bounds.

deep learning, hessian, neural network, (20 more...)

arXiv.org Machine Learning

2010.04261

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Lazy Training for Over-parameterized Tensor Decomposition

Wang, Xiang, Wu, Chenwei, Lee, Jason D., Ma, Tengyu, Ge, Rong

arXiv.org Machine LearningOct-21-2020

Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an $l$-th order tensor in $(R^d)^{\otimes l}$ of rank $r$ (where $r\ll d$), can variants of gradient descent find a rank $m$ decomposition where $m > r$? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2.5l}\log d)$. Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.

artificial intelligence, function value, neural network, (17 more...)

arXiv.org Machine Learning

2010.11356

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Add feedback

Secure Data Sharing With Flow Model

Wu, Chenwei, Du, Chenzhuang, Yuan, Yang

arXiv.org Machine LearningSep-24-2020

We consider a variant of this problem, where instead of requiring the data to be completely private so that no one gets In the classical multi-party computation setting, any information about it, we only require data to be partially multiple parties jointly compute a function without private. That is, no one can efficiently recover the original revealing their own input data. We consider a data, but users can extract other useful information from the variant of this problem, where the input data can encrypted data. Although being different, our requirement be shared for machine learning training purposes, has the flavor of differential privacy (Dwork et al., 2006), but the data are also encrypted so that they cannot e.g., users can obtain the average salary of all employees, be recovered by other parties. We present a but cannot figure out the salary of each individual.

deep learning, matrix, neural network, (20 more...)

arXiv.org Machine Learning

2009.11762

Country:

Asia (0.28)
North America > United States > North Carolina (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Wang, Xiang, Yuan, Shuai, Wu, Chenwei, Ge, Rong

arXiv.org Machine LearningJun-29-2020

Learning-to-learn (using optimization algorithms to learn a new optimizer) has successfully trained efficient optimizers in practice. This approach relies on meta-gradient descent on a meta-objective based on the trajectory that the optimizer generates. However, there were few theoretical guarantees on how to avoid meta-gradient explosion/vanishing problems, or how to train an optimizer with good generalization performance. In this paper, we study the learning-to-learn approach on a simple problem of tuning the step size for quadratic loss. Our results show that although there is a way to design the meta-objective so that the meta-gradient remain polynomially bounded, computing the meta-gradient directly using backpropagation leads to numerical issues that look similar to gradient explosion/vanishing problems. We also characterize when it is necessary to compute the meta-objective on a separate validation set instead of the original training set. Finally, we verify our results empirically and show that a similar phenomenon appears even for more complicated learned optimizers parametrized by neural networks.

deep learning, neural network, null 2, (17 more...)

arXiv.org Machine Learning

2006.16495

Genre: Research Report > New Finding (0.74)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.93)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.93)
Energy > Oil & Gas > Midstream (0.93)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback