AITopics | Ji, Ziwei

Collaborating Authors

Ji, Ziwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks

Ji, Ziwei, Telgarsky, Matus

arXiv.org Machine LearningSep-26-2019

Recent work has revealed that overparameterized networks trained by gradient descent achieve arbitrarily low training error, and sometimes even low test error. The required width, however, is always polynomial in at least one of the sample size $n$, the (inverse) training error $1/\epsilon$, and the (inverse) failure probability $1/\delta$. This work shows that $\widetilde{O}(1/\epsilon)$ iterations of gradient descent on two-layer networks of any width exceeding $\mathrm{polylog}(n,1/\epsilon,1/\delta)$ and $\widetilde{\Omega}(1/\epsilon^2)$ training examples suffices to achieve a test error of $\epsilon$. The analysis further relies upon a margin property of the limiting kernel, which is guaranteed positive, and can distinguish between true labels and random labels.

artificial intelligence, machine learning, null 2, (17 more...)

arXiv.org Machine Learning

1909.12292

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.93)

Add feedback

Approximation power of random neural networks

Bailey, Bolton, Ji, Ziwei, Telgarsky, Matus, Xian, Ruicheng

arXiv.org Machine LearningJun-18-2019

This paper investigates the approximation power of three types of random neural networks: (a) infinite width networks, with weights following an arbitrary distribution; (b) finite width networks obtained by subsampling the preceding infinite width networks; (c) finite width networks obtained by starting with standard Gaussian initialization, and then adding a vanishingly small correction to the weights. The primary result is a fully quantified bound on the rate of approximation of general general continuous functions: in all three cases, a function $f$ can be approximated with complexity $\|f\|_1 (d/\delta)^{\mathcal{O}(d)}$, where $\delta$ depends on continuity properties of $f$ and the complexity measure depends on the weight magnitudes and/or cardinalities. Along the way, a variety of ancillary results are developed: an exact construction of Gaussian densities with infinite width networks, an elementary stand-alone proof scheme for approximation via convolutions of radial basis functions, subsampling rates for infinite width networks, and depth separation for corrected networks.

artificial intelligence, neural network, probability, (18 more...)

arXiv.org Machine Learning

1906.07709

Country:

North America > United States > Illinois (0.14)
Asia (0.14)

Genre: Research Report (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A refined primal-dual analysis of the implicit bias

Ji, Ziwei, Telgarsky, Matus

arXiv.org Machine LearningJun-11-2019

Recent work shows that gradient descent on linearly separable data is implicitly biased towards the maximum margin solution. However, no convergence rate which is tight in both n (the dataset size) and t (the training time) is given. This work proves that the normalized gradient descent iterates converge to the maximum margin solution at a rate of O(ln(n)/ ln(t)), which is tight in both n and t. The proof is via a dual convergence result: gradient descent induces a multiplicative weights update on the (normalized) SVM dual objective, whose convergence rate leads to the tight implicit bias rate.

artificial intelligence, gradient descent, machine learning, (15 more...)

arXiv.org Machine Learning

1906.0454

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Gradient descent aligns the layers of deep linear networks

Ji, Ziwei, Telgarsky, Matus

arXiv.org Machine LearningOct-3-2018

This paper establishes risk convergence and asymptotic weight matrix alignment --- a form of implicit regularization --- of gradient flow and gradient descent when applied to deep linear networks on linearly separable data. In more detail, for gradient flow applied to strictly decreasing loss functions (with similar results for gradient descent with particular decreasing step sizes): (i) the risk converges to 0; (ii) the normalized i-th weight matrix asymptotically equals its rank-1 approximation $u_iv_i^{\top}$; (iii) these rank-1 matrices are aligned across layers, meaning $|v_{i+1}^{\top}u_i|\to1$. In the case of the logistic loss (binary cross entropy), more can be said: the linear function induced by the network --- the product of its weight matrices --- converges to the same direction as the maximum margin solution. This last property was identified in prior work, but only under assumptions on gradient descent which here are implied by the alignment phenomenon.

artificial intelligence, neural network, prod, (17 more...)

arXiv.org Machine Learning

1810.02032

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Risk and parameter convergence of logistic regression

Ji, Ziwei, Telgarsky, Matus

arXiv.org Machine LearningMar-20-2018

The logistic loss is strictly convex and does not attain its infimum; consequently the solutions of logistic regression are in general off at infinity. This work provides a convergence analysis of gradient descent applied to logistic regression under no assumptions on the problem instance. Firstly, the risk is shown to converge at a rate $\mathcal{O}(\ln(t)^2/t)$. Secondly, the parameter convergence is characterized along a unique pair of complementary subspaces defined by the problem instance: one subspace along which strong convexity induces parameters to converge at rate $\mathcal{O}(\ln(t)^2/\sqrt{t})$, and its orthogonal complement along which separability induces parameters to converge in direction at rate $\mathcal{O}(\ln\ln(t) / \ln(t))$.

artificial intelligence, convergence, machine learning, (19 more...)

arXiv.org Machine Learning

1803.073

Country: North America > United States > Illinois (0.14)

Genre:

Research Report > New Finding (0.81)
Research Report > Experimental Study (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.81)

Add feedback