AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Provable Convergence of Variational Monte Carlo Methods

Li, Tianyou, Chen, Fan, Chen, Huajie, Wen, Zaiwen

arXiv.org Machine LearningMar-19-2023

The Variational Monte Carlo (VMC) is a promising approach for computing the ground state energy of many-body quantum problems and attracts more and more interests due to the development of machine learning. The recent paradigms in VMC construct neural networks as trial wave functions, sample quantum configurations using Markov chain Monte Carlo (MCMC) and train neural networks with stochastic gradient descent (SGD) method. However, the theoretical convergence of VMC is still unknown when SGD interacts with MCMC sampling given a well-designed trial wave function. Since MCMC reduces the difficulty of estimating gradients, it has inevitable bias in practice. Moreover, the local energy may be unbounded, which makes it harder to analyze the error of MCMC sampling. Therefore, we assume that the local energy is sub-exponential and use the Bernstein inequality for non-stationary Markov chains to derive error bounds of the MCMC estimator. Consequently, VMC is proven to have a first order convergence rate $O(\log K/\sqrt{n K})$ with $K$ iterations and a sample size $n$. It partially explains how MCMC influences the behavior of SGD. Furthermore, we verify the so-called correlated negative curvature condition and relate it to the zero-variance phenomena in solving eigenvalue functions. It is shown that VMC escapes from saddle points and reaches $(\epsilon,\epsilon^{1/4})$ -approximate second order stationary points or $\epsilon^{1/2}$-variance points in at least $O(\epsilon^{-11/2}\log^{2}(1/\epsilon) )$ steps with high probability. Our analysis enriches the understanding of how VMC converges efficiently and can be applied to general variational methods in physics and statistics.

artificial intelligence, inequality, machine learning, (18 more...)

arXiv.org Machine Learning

2303.10599

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Byzantine-Resilient Federated Learning at Edge

Tao, Youming, Cui, Sijia, Xu, Wenlu, Yin, Haofei, Yu, Dongxiao, Liang, Weifa, Cheng, Xiuzhen

arXiv.org Artificial IntelligenceMar-18-2023

Both Byzantine resilience and communication efficiency have attracted tremendous attention recently for their significance in edge federated learning. However, most existing algorithms may fail when dealing with real-world irregular data that behaves in a heavy-tailed manner. To address this issue, we study the stochastic convex and non-convex optimization problem for federated learning at edge and show how to handle heavy-tailed data while retaining the Byzantine resilience, communication efficiency and the optimal statistical error rates simultaneously. Specifically, we first present a Byzantine-resilient distributed gradient descent algorithm that can handle the heavy-tailed data and meanwhile converge under the standard assumptions. To reduce the communication overhead, we further propose another algorithm that incorporates gradient compression techniques to save communication costs during the learning process. Theoretical analysis shows that our algorithms achieve order-optimal statistical error rate in presence of Byzantine devices. Finally, we conduct extensive experiments on both synthetic and real-world datasets to verify the efficacy of our algorithms.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.10434

Country:

Asia > China (0.93)
North America > United States (0.67)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.67)
Education (0.46)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Learning Gradients of Convex Functions with Monotone Gradient Networks

Chaudhari, Shreyas, Pranav, Srinivasa, Moura, José M. F.

arXiv.org Artificial IntelligenceMar-17-2023

However, it is often a laborious process that involves While much effort has been devoted to deriving and analyzing manually designing suitable convex objectives and associated effective convex formulations of signal processing problems, convex constraints. Perhaps more important than the the gradients of convex functions also have critical applications objective function itself is the gradient of the function, since ranging from gradient-based optimization to optimal most convex problems are solved using computationally frugal transport. Recent works have explored data-driven methods gradient-based methods. Monotone gradient maps of convex for learning convex objective functions, but learning their functions also have critical applications in domains including monotone gradients is seldom studied. In this work, we propose gradient-based optimization, generalized linear models, C-MGN and M-MGN, two monotone gradient neural network linear inverse problems, and optimal transport. Therefore, architectures for directly learning the gradients of convex in this work, we propose to learn the gradient of convex functions. We show that, compared to state of the art functions in a data-driven manner using deep learning. Our methods, our networks are easier to train, learn monotone approach is a fundamental step toward blending strengths of gradient fields more accurately, and use significantly fewer both deep learning and convex optimization and offers a wide parameters. We further demonstrate their ability to learn optimal array of applications in data science and signal processing.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2301.10862

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Fairness-aware Differentially Private Collaborative Filtering

Yang, Zhenhuan, Ge, Yingqiang, Su, Congzhe, Wang, Dingxian, Zhao, Xiaoting, Ying, Yiming

arXiv.org Artificial IntelligenceMar-16-2023

Recently, there has been an increasing adoption of differential privacy guided algorithms for privacy-preserving machine learning tasks. However, the use of such algorithms comes with trade-offs in terms of algorithmic fairness, which has been widely acknowledged. Specifically, we have empirically observed that the classical collaborative filtering method, trained by differentially private stochastic gradient descent (DP-SGD), results in a disparate impact on user groups with respect to different user engagement levels. This, in turn, causes the original unfair model to become even more biased against inactive users. To address the above issues, we propose \textbf{DP-Fair}, a two-stage framework for collaborative filtering based algorithms. Specifically, it combines differential privacy mechanisms with fairness constraints to protect user privacy while ensuring fair recommendations. The experimental results, based on Amazon datasets, and user history logs collected from Etsy, one of the largest e-commerce platforms, demonstrate that our proposed method exhibits superior performance in terms of both overall accuracy and user group fairness on both shallow and deep recommendation models compared to vanilla DP-SGD.

artificial intelligence, dp-sgd, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3543873.3587577

2303.09527

Country:

North America > United States > Texas > Travis County > Austin (0.05)
North America > United States > New York > Kings County > New York City (0.05)
North America > United States > New York > New York County > New York City (0.05)
(4 more...)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

SoftZoo: A Soft Robot Co-design Benchmark For Locomotion In Diverse Environments

Wang, Tsun-Hsuan, Ma, Pingchuan, Spielberg, Andrew Everett, Xian, Zhou, Zhang, Hao, Tenenbaum, Joshua B., Rus, Daniela, Gan, Chuang

arXiv.org Artificial IntelligenceMar-16-2023

While significant research progress has been made in robot learning for control, unique challenges arise when simultaneously co-optimizing morphology. Existing work has typically been tailored for particular environments or representations. In order to more fully understand inherent design and performance tradeoffs and accelerate the development of new breeds of soft robots, a comprehensive virtual platform with well-established tasks, environments, and evaluation metrics is needed. In this work, we introduce SoftZoo, a soft robot co-design platform for locomotion in diverse environments. SoftZoo supports an extensive, naturally-inspired material set, including the ability to simulate environments such as flat ground, desert, wetland, clay, ice, snow, shallow water, and ocean. Further, it provides a variety of tasks relevant for soft robotics, including fast locomotion, agile turning, and path following, as well as differentiable design representations for morphology and control. Combined, these elements form a feature-rich platform for analysis and development of soft robot co-design algorithms. We benchmark prevalent representations and co-design algorithms, and shed light on 1) the interplay between environment, morphology, and behavior; 2) the importance of design space representations; 3) the ambiguity in muscle formation and controller synthesis; and 4) the value of differentiable physics. We envision that SoftZoo will serve as a standard platform and template an approach toward the development of novel representations and algorithms for co-designing soft robots' behavioral and morphological intelligence.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.09555

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(2 more...)

Add feedback

Controlled Descent Training

Andersson, Viktor, Varga, Balázs, Szolnoky, Vincent, Syrén, Andreas, Jörnsten, Rebecka, Kulcsár, Balázs

arXiv.org Artificial IntelligenceMar-16-2023

In this work, a novel and model-based artificial neural network (ANN) training method is developed supported by optimal control theory. The method augments training labels in order to robustly guarantee training loss convergence and improve training convergence rate. Dynamic label augmentation is proposed within the framework of gradient descent training where the convergence of training loss is controlled. First, we capture the training behavior with the help of empirical Neural Tangent Kernels (NTK) and borrow tools from systems and control theory to analyze both the local and global training dynamics (e.g. stability, reachability). Second, we propose to dynamically alter the gradient descent training mechanism via fictitious labels as control inputs and an optimal state feedback policy. In this way, we enforce locally $\mathcal{H}_2$ optimal and convergent training behavior. The novel algorithm, \textit{Controlled Descent Training} (CDT), guarantees local convergence. CDT unleashes new potentials in the analysis, interpretation, and design of ANN architectures. The applicability of the method is demonstrated on standard regression and classification problems.

artificial intelligence, machine learning, training dynamic, (18 more...)

arXiv.org Artificial Intelligence

2303.09216

Country:

North America > United States > New York (0.04)
North America > United States > Michigan (0.04)
North America > United States > Iowa (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Decentralized Riemannian natural gradient methods with Kronecker-product approximations

Hu, Jiang, Deng, Kangkang, Li, Na, Li, Quanzheng

arXiv.org Artificial IntelligenceMar-16-2023

With a computationally efficient approximation of the second-order information, natural gradient methods have been successful in solving large-scale structured optimization problems. We study the natural gradient methods for the large-scale decentralized optimization problems on Riemannian manifolds, where the local objective function defined by the local dataset is of a log-probability type. By utilizing the structure of the Riemannian Fisher information matrix (RFIM), we present an efficient decentralized Riemannian natural gradient descent (DRNGD) method. To overcome the communication issue of the high-dimension RFIM, we consider a class of structured problems for which the RFIM can be approximated by a Kronecker product of two low-dimension matrices. By performing the communications over the Kronecker factors, a high-quality approximation of the RFIM can be obtained in a low cost. We prove that DRNGD converges to a stationary point with the best-known rate of $\mathcal{O}(1/K)$. Numerical experiments demonstrate the efficiency of our proposed method compared with the state-of-the-art ones. To the best of our knowledge, this is the first Riemannian second-order method for solving decentralized manifold optimization problems.

artificial intelligence, machine learning, natural gradient method, (14 more...)

arXiv.org Artificial Intelligence

2303.09611

Country:

North America > United States > Massachusetts (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Progress in Nonsmooth Optimization part4(Machine Learning)

#artificialintelligenceMar-15-2023, 07:00:21 GMT

Abstract:: We propose and analyze several stochastic gradient algorithms for finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems. First, we propose a simple proximal stochastic gradient algorithm based on variance reduction called ProxSVRG . We provide a clean and tight analysis of ProxSVRG, which shows that it outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, hence solves an open problem proposed in Reddi et al. (2016b). Also, ProxSVRG uses much less proximal oracle calls than ProxSVRG (Reddi et al., 2016b) and extends to the online setting by avoiding full gradient computations. Then, we further propose an optimal algorithm, called SSRGD, based on SARAH (Nguyen et al., 2017) and show that SSRGD further improves the gradient complexity of ProxSVRG and achieves the optimal upper bound, matching the known lower bound of (Fang et al., 2018; Li et al., 2021).

algorithm, machine learning, nonsmooth optimization part4, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.80)

Add feedback

Neural Networks Efficiently Learn Low-Dimensional Representations with SGD

Mousavi-Hosseini, Alireza, Park, Sejun, Girotti, Manuela, Mitliagkas, Ioannis, Erdogdu, Murat A.

arXiv.org Artificial IntelligenceMar-15-2023

We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input $\boldsymbol{x}\in \mathbb{R}^d$ is Gaussian and the target $y \in \mathbb{R}$ follows a multiple-index model, i.e., $y=g(\langle\boldsymbol{u_1},\boldsymbol{x}\rangle,...,\langle\boldsymbol{u_k},\boldsymbol{x}\rangle)$ with a noisy link function $g$. We prove that the first-layer weights of the NN converge to the $k$-dimensional principal subspace spanned by the vectors $\boldsymbol{u_1},...,\boldsymbol{u_k}$ of the true model, when online SGD with weight decay is used for training. This phenomenon has several important consequences when $k \ll d$. First, by employing uniform convergence on this smaller subspace, we establish a generalization error bound of $O(\sqrt{{kd}/{T}})$ after $T$ iterations of SGD, which is independent of the width of the NN. We further demonstrate that, SGD-trained ReLU NNs can learn a single-index target of the form $y=f(\langle\boldsymbol{u},\boldsymbol{x}\rangle) + \epsilon$ by recovering the principal direction, with a sample complexity linear in $d$ (up to log factors), where $f$ is a monotonic function with at most polynomial growth, and $\epsilon$ is the noise. This is in contrast to the known $d^{\Omega(p)}$ sample requirement to learn any degree $p$ polynomial in the kernel regime, and it shows that NNs trained with SGD can outperform the neural tangent kernel at initialization. Finally, we also provide compressibility guarantees for NNs using the approximate low-rank structure produced by SGD.

artificial intelligence, machine learning, probability, (15 more...)

arXiv.org Artificial Intelligence

2209.14863

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Learning Fractals by Gradient Descent

Tu, Cheng-Hao, Chen, Hong-You, Carlyn, David, Chao, Wei-Lun

arXiv.org Artificial IntelligenceMar-14-2023

Fractals are geometric shapes that can display complex and self-similar patterns found in nature (e.g., clouds and plants). Recent works in visual recognition have leveraged this property to create random fractal images for model pre-training. In this paper, we study the inverse problem -- given a target image (not necessarily a fractal), we aim to generate a fractal image that looks like it. We propose a novel approach that learns the parameters underlying a fractal image via gradient descent. We show that our approach can find fractal parameters of high visual quality and be compatible with different loss functions, opening up several potentials, e.g., learning fractals for downstream tasks, scientific understanding, etc.

artificial intelligence, fractal, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2303.12722

Country:

North America > United States > Ohio (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > New York (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)

Add feedback