AITopics | An, Jing

Collaborating Authors

An, Jing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Convergence of two-timescale gradient descent ascent dynamics: finite-dimensional and mean-field perspectives

An, Jing, Lu, Jianfeng

arXiv.org Artificial IntelligenceJan-28-2025

The two-timescale gradient descent-ascent (GDA) is a canonical gradient algorithm designed to find Nash equilibria in min-max games. We analyze the two-timescale GDA by investigating the effects of learning rate ratios on convergence behavior in both finite-dimensional and mean-field settings. In particular, for finite-dimensional quadratic min-max games, we obtain long-time convergence in near quasi-static regimes through the hypocoercivity method. For mean-field GDA dynamics, we investigate convergence under a finite-scale ratio using a mixed synchronous-reflection coupling technique.

artificial intelligence, conference, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.17122

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss

Bréchet, Pierre, Papagiannouli, Katerina, An, Jing, Montúfar, Guido

arXiv.org Artificial IntelligenceJul-13-2023

We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. While recent works have made advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much emphasis has been placed on discriminative settings and the square loss. In contrast, our model considers another type of loss and connects with the generative setting. We characterize the critical points and minimizers of the Bures-Wasserstein distance over the space of rank-bounded matrices. The Hessian of this loss at low-rank matrices can theoretically blow up, which creates challenges to analyze convergence of gradient optimization methods. We establish convergence results for gradient flow using a smooth perturbative version of the loss as well as convergence results for finite step size gradient descent under certain assumptions on the initial weights.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2303.03027

Country:

Europe (0.67)
North America > United States > Hawaii (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Convergence of stochastic gradient descent under a local Lajasiewicz condition for deep neural networks

An, Jing, Lu, Jianfeng

arXiv.org Artificial IntelligenceApr-18-2023

We extend the global convergence result of Chatterjee \cite{chatterjee2022convergence} by considering the stochastic gradient descent (SGD) for non-convex objective functions. With minimal additional assumptions that can be realized by finitely wide neural networks, we prove that if we initialize inside a local region where the \L{}ajasiewicz condition holds, with a positive probability, the stochastic gradient iterates converge to a global minimum inside this region. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. For that, we assume the SGD noise scales with the objective function, which is called machine learning noise and achievable in many real examples. Furthermore, we provide a negative argument to show why using the boundedness of noise with Robbins-Monro type step sizes is not enough to keep the key component valid.

artificial intelligence, local lajasiewicz condition, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2304.09221

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Why resampling outperforms reweighting for correcting sampling bias

An, Jing, Ying, Lexing, Zhu, Yuhua

arXiv.org Machine LearningSep-28-2020

A data set sampled from a certain population is biased if the subgroups of the population are sampled at proportions that are significantly different from their underlying proportions. Training machine learning models on biased data sets requires correction techniques to compensate for potential biases. We consider two commonly-used techniques, resampling and reweighting, that rebalance the proportions of the subgroups to maintain the desired objective function. Though statistically equivalent, it has been observed that reweighting outperforms resampling when combined with stochastic gradient algorithms. By analyzing illustrative examples, we explain the reason behind this phenomenon using tools from dynamical stability and stochastic asymptotics. We also present experiments from regression, classification, and off-policy prediction to demonstrate that this is a general phenomenon. We argue that it is imperative to consider the objective function design and the optimization algorithm together while addressing the sampling bias.

optimization problem, proportion, survey article, (19 more...)

arXiv.org Machine Learning

2009.13447

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback

Stochastic modified equations for the asynchronous stochastic gradient descent

An, Jing, Lu, Jianfeng, Ying, Lexing

arXiv.org Machine LearningMay-21-2018

We propose a stochastic modified equations (SME) for modeling the asynchronous stochastic gradient descent (ASGD) algorithms. The resulting SME of Langevin type extracts more information about the ASGD dynamics and elucidates the relationship between different types of stochastic gradient algorithms. We show the convergence of ASGD to the SME in the continuous time limit, as well as the SME's precise prediction to the trajectories of ASGD with various forcing terms. As an application of the SME, we propose an optimal mini-batching strategy for ASGD via solving the optimal control problem of the associated SME.

artificial intelligence, equation, machine learning, (17 more...)

arXiv.org Machine Learning

1805.08244

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback