AITopics | Liu, Chaoyue

Collaborating Authors

Liu, Chaoyue

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the linearity of large non-linear models: when and why the tangent kernel is constant

Liu, Chaoyue, Zhu, Libin, Belkin, Mikhail

arXiv.org Machine LearningOct-2-2020

The goal of this work is to shed light on the remarkable phenomenon of transition to linearity of certain neural networks as their width approaches infinity. We show that the transition to linearity of the model and, equivalently, constancy of the (neural) tangent kernel (NTK) result from the scaling properties of the norm of the Hessian matrix of the network as a function of the network width. We present a general framework for understanding the constancy of the tangent kernel via Hessian scaling applicable to the standard classes of neural networks. Our analysis provides a new perspective on the phenomenon of constant tangent kernel, which is different from the widely accepted "lazy training". Furthermore, we show that the transition to linearity is not a general property of wide neural networks and does not hold when the last layer of the network is non-linear. It is also not necessary for successful optimization by gradient descent.

deep learning, neural network, tangent kernel, (18 more...)

arXiv.org Machine Learning

2010.01092

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

MaSS: an Accelerated Stochastic Method for Over-parametrized Learning

Liu, Chaoyue, Belkin, Mikhail

arXiv.org Machine LearningNov-5-2018

Stochastic gradient based methods are dominant in optimization for most large-scale machine learning problems, due to the simplicity of computation and their compatibility with modern parallel hardware, such as GPU. In most cases these methods use over-parametrized models allowing for interpolation, i.e., perfect fitting of the training data. While we do not yet have a full understanding of why these solutions generalize (as indicated by a wealth of empirical evidence, e.g., [22, 2]) we are beginning to recognize their desirable properties for optimization, particularly in the SGD setting [11]. In this paper, we leverage the power of the interpolated setting to propose MaSS (Momentum-added Stochastic Solver), a stochastic momentum method for efficient training of over-parametrized models. See pseudo code in Appendix A. The algorithm keeps two variables (weights)w andu .

artificial intelligence, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

1810.13395

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Clustering with Bregman Divergences: an Asymptotic Analysis

Liu, Chaoyue, Belkin, Mikhail

Neural Information Processing SystemsDec-31-2016

Clustering, in particular $k$-means clustering, is a central topic in data analysis. Clustering with Bregman divergences is a recently proposed generalization of $k$-means clustering which has already been widely used in applications. In this paper we analyze theoretical properties of Bregman clustering when the number of the clusters $k$ is large. We establish quantization rates and describe the limiting distribution of the centers as $k\to \infty$, extending well-known results for $k$-means clustering.

artificial intelligence, bregman divergence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.98)
Information Technology > Data Science (0.90)

Add feedback