AITopics

2304.11788

Country: Europe > Italy (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.37)

Jiang, Ruichen, Abolfazli, Nazanin, Mokhtari, Aryan, Hamedani, Erfan Yazdandoost

A Conditional Gradient-based Method for Simple Bilevel Optimization with Convex Lower-level Problem

arXiv.org Artificial IntelligenceApr-23-2023

In this paper, we study a class of bilevel optimization problems, also known as simple bilevel optimization, where we minimize a smooth objective function over the optimal solution set of another convex constrained optimization problem. Several iterative methods have been developed for tackling this class of problems. Alas, their convergence guarantees are either asymptotic for the upper-level objective, or the convergence rates are slow and sub-optimal. To address this issue, in this paper, we introduce a novel bilevel optimization method that locally approximates the solution set of the lower-level problem via a cutting plane, and then runs a conditional gradient update to decrease the upper-level objective. When the upper-level objective is convex, we show that our method requires ${\mathcal{O}}(\max\{1/\epsilon_f,1/\epsilon_g\})$ iterations to find a solution that is $\epsilon_f$-optimal for the upper-level objective and $\epsilon_g$-optimal for the lower-level objective. Moreover, when the upper-level objective is non-convex, our method requires ${\mathcal{O}}(\max\{1/\epsilon_f^2,1/(\epsilon_f\epsilon_g)\})$ iterations to find an $(\epsilon_f,\epsilon_g)$-optimal solution. We also prove stronger convergence guarantees under the H\"olderian error bound assumption on the lower-level problem. To the best of our knowledge, our method achieves the best-known iteration complexity for the considered class of bilevel problems.

artificial intelligence, machine learning, optimization problem, (15 more...)

2206.08868

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Arizona (0.04)
Europe > Spain (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

arXiv.org Artificial IntelligenceApr-23-2023

Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization

Shen, Zebang, Qian, Hui, Mu, Tongzhou, Zhang, Chao

's andP(x) is a block-separable and convex regularization function. We allow both F(x) and P(x) to be non-smooth. In machine learning applications, many tasks can be naturally phrased as the above problem, e.g. the Empirical Risk Minimization (ERM) problem Zhang and Xiao [2015], Friedman et al. [2001]. However, the latest explosive growth of data introduces unprecedented computational challenges of scalability and storage bottleneck. In consequence, algorithms with fast convergence, small memory footprints, and low per-iteration complexity have been ardently pursued in the recent years. Most successful practices adopt stochastic strategies that incorporate randomness into solving procedures. They followed two parallel tracks. That is, in each iteration, gradient is estimated by either a mini batch of samples or a small block of variable coordinates, commonly designated by a random procedure. Accessing only a random mini batch of samples in each iteration, classical Stochastic Gradient Descent (SGD) and its Variance Reduction (VR) variants, such as SVRG Johnson and Zhang [2013], SAGA Defazio et al. [2014], and SAG Schmidt et al. [2013], have gained increasing attention in the last quinquennium.

algorithm, artificial intelligence, machine learning, (13 more...)

2304.11665

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Maniyar, Mizhaan Prajit, Mondal, Akash, A., Prashanth L., Bhatnagar, Shalabh

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

arXiv.org Artificial IntelligenceApr-21-2023

We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary point of the value function. In this paper, we propose two policy Newton algorithms that incorporate cubic regularization. Both algorithms employ the likelihood ratio method to form estimates of the gradient and Hessian of the value function using sample trajectories. The first algorithm requires an exact solution of the cubic regularized problem in each iteration, while the second algorithm employs an efficient gradient descent-based approximation to the cubic regularized problem. We establish convergence of our proposed algorithms to a second-order stationary point (SOSP) of the value function, which results in the avoidance of traps in the form of saddle points. In particular, the sample complexity of our algorithms to find an $\epsilon$-SOSP is $O(\epsilon^{-3.5})$, which is an improvement over the state-of-the-art sample complexity of $O(\epsilon^{-4.5})$.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2304.10951

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Huang, Feihu, Chen, Songcan

Near-Optimal Decentralized Momentum Method for Nonconvex-PL Minimax Problems

arXiv.org Artificial IntelligenceApr-21-2023

Minimax optimization plays an important role in many machine learning tasks such as generative adversarial networks (GANs) and adversarial training. Although recently a wide variety of optimization methods have been proposed to solve the minimax problems, most of them ignore the distributed setting where the data is distributed on multiple workers. Meanwhile, the existing decentralized minimax optimization methods rely on the strictly assumptions such as (strongly) concavity and variational inequality conditions. In the paper, thus, we propose an efficient decentralized momentum-based gradient descent ascent (DM-GDA) method for the distributed nonconvex-PL minimax optimization, which is nonconvex in primal variable and is nonconcave in dual variable and satisfies the Polyak-Lojasiewicz (PL) condition. In particular, our DM-GDA method simultaneously uses the momentum-based techniques to update variables and estimate the stochastic gradients. Moreover, we provide a solid convergence analysis for our DM-GDA method, and prove that it obtains a near-optimal gradient complexity of $O(\epsilon^{-3})$ for finding an $\epsilon$-stationary solution of the nonconvex-PL stochastic minimax problems, which reaches the lower bound of nonconvex stochastic optimization. To the best of our knowledge, we first study the decentralized algorithm for Nonconvex-PL stochastic minimax optimization over a network.

artificial intelligence, machine learning, optimization, (15 more...)

2304.10902

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

arXiv.org Artificial IntelligenceApr-21-2023

An Incomplete Tensor Tucker decomposition based Traffic Speed Prediction Method

Mi, Jiajia

In intelligent transport systems, it is common and inevitable with missing data. While complete and valid traffic speed data is of great importance to intelligent transportation systems. A latent factorization-of-tensors (LFT) model is one of the most attractive approaches to solve missing traffic data recovery due to its well-scalability. A LFT model achieves optimization usually via a stochastic gradient descent (SGD) solver, however, the SGD-based LFT suffers from slow convergence. To deal with this issue, this work integrates the unique advantages of the proportional-integral-derivative (PID) controller into a Tucker decomposition based LFT model. It adopts two-fold ideas: a) adopting tucker decomposition to build a LFT model for achieving a better recovery accuracy. b) taking the adjusted instance error based on the PID control theory into the SGD solver to effectively improve convergence rate. Our experimental studies on two major city traffic road speed datasets show that the proposed model achieves significant efficiency gain and highly competitive prediction accuracy.

artificial intelligence, ieee transaction, machine learning, (18 more...)

2304.10961

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Ground > Road (0.70)
Transportation > Infrastructure & Services (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Mishra, Neel, Kumar, Pawan

Angle based dynamic learning rate for gradient descent

In our work, we propose a novel yet simple approach to obtain an adaptive learning rate for gradient-based descent methods on classification tasks. Instead of the traditional approach of selecting adaptive learning rates via the decayed expectation of gradient-based terms, we use the angle between the current gradient and the new gradient: this new gradient is computed from the direction orthogonal to the current gradient, which further helps us in determining a better adaptive learning rate based on angle history, thereby, leading to relatively better accuracy compared to the existing state-of-the-art optimizers. On a wide variety of benchmark datasets with prominent image classification architectures such as ResNet, DenseNet, EfficientNet, and VGG, we find that our method leads to the highest accuracy in most of the datasets. Moreover, we prove that our method is convergent.

artificial intelligence, deep learning, machine learning, (16 more...)

2304.10457

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > India > Telangana > Hyderabad (0.05)
Asia > Singapore (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Zhang, Gavin, Fattahi, Salar, Zhang, Richard Y.

Preconditioned Gradient Descent for Overparameterized Nonconvex Burer--Monteiro Factorization with Global Optimality Certification

We consider using gradient descent to minimize the nonconvex function $f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $\phi$ is an underlying smooth convex cost function defined over $n\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally rank deficient, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $r$ of the current iterate $X$ to be overparameterized with respect to the rank $r^{\star}$ of the global minimizer $X^{\star}$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $r=r^{\star}$ to a sublinear rate when $r>r^{\star}$, even when $\phi$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $X^{\star}$.

artificial intelligence, machine learning, precgd, (16 more...)

2206.03345

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report > New Finding (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators

Jung, Victor J. B., Symons, Arne, Mei, Linyan, Verhelst, Marian, Benini, Luca

To meet the growing need for computational power for DNNs, multiple specialized hardware architectures have been proposed. Each DNN layer should be mapped onto the hardware with the most efficient schedule, however, SotA schedulers struggle to consistently provide optimum schedules in a reasonable time across all DNN-HW combinations. This paper proposes SALSA, a fast dual-engine scheduler to generate optimal execution schedules for both even and uneven mapping. We introduce a new strategy, combining exhaustive search with simulated annealing to address the dynamic nature of the loop ordering design space size across layers. SALSA is extensively benchmarked against two SotA schedulers, LOMA and Timeloop on 5 different DNNs, on average SALSA finds schedules with 11.9% and 7.6% lower energy while speeding up the search by 1.7x and 24x compared to LOMA and Timeloop, respectively.

artificial intelligence, machine learning, salsa, (14 more...)

2304.12931

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

DPAF: Image Synthesis via Differentially Private Aggregation in Forward Phase

Lin, Chih-Hsun, Hsu, Chia-Yi, Yu, Chia-Mu, Cao, Yang, Huang, Chun-Ying

Differentially private synthetic data is a promising alternative for sensitive data release. Many differentially private generative models have been proposed in the literature. Unfortunately, they all suffer from the low utility of the synthetic data, particularly for images of high resolutions. Here, we propose DPAF, an effective differentially private generative model for high-dimensional image synthesis. Different from the prior private stochastic gradient descent-based methods that add Gaussian noises in the backward phase during the model training, DPAF adds a differentially private feature aggregation in the forward phase, bringing advantages, including the reduction of information loss in gradient clipping and low sensitivity for the aggregation. Moreover, as an improper batch size has an adverse impact on the utility of synthetic data, DPAF also tackles the problem of setting a proper batch size by proposing a novel training strategy that asymmetrically trains different parts of the discriminator. We extensively evaluate different methods on multiple image datasets (up to images of 128x128 resolution) to demonstrate the performance of DPAF.

artificial intelligence, deep learning, machine learning, (19 more...)

2304.12185

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)