AITopics

The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees for the same asynchronous SGD algorithm regardless of the delays in the gradients, depending instead just on the number of parallel devices used to implement the algorithm. Our guarantees are strictly better than the existing analyses, and we also argue that asynchronous SGD outperforms synchronous minibatch SGD in the settings we consider. For our analysis, we introduce a novel recursion based on "virtual iterates" and delay-adaptive stepsizes, which allow us to derive state-of-the-art guarantees for both convex and non-convex objectives.

artificial intelligence, machine learning, prev, (14 more...)

2206.07638

Country:

Asia > Middle East > Jordan (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Avdiukhin, Dmitrii, Yaroslavtsev, Grigory

HOUDINI: Escaping from Moderately Constrained Saddles

We give the first polynomial time algorithms for escaping from high-dimensional saddle points under a moderate number of constraints. Given gradient access to a smooth function $f \colon \mathbb R^d \to \mathbb R$ we show that (noisy) gradient descent methods can escape from saddle points under a logarithmic number of inequality constraints. This constitutes the first tangible progress (without reliance on NP-oracles or altering the definitions to only account for certain constraints) on the main open question of the breakthrough work of Ge et al. who showed an analogous result for unconstrained and equality-constrained problems. Our results hold for both regular and stochastic gradient descent.

artificial intelligence, constraint, machine learning, (16 more...)

2205.13753

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(7 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Danisetty, Sachin Kumar, Mylaram, Santhosh Reddy, Kumar, Pawan

Adaptive Consensus Optimization Method for GANs

We propose a second order gradient based method with ADAM and RMSprop for the training of generative adversarial networks. The proposed method is fastest to obtain similar accuracy when compared to prominent second order methods. Unlike state-of-the-art recent methods, it does not require solving a linear system, or it does not require additional mixed second derivative terms. We derive the fixed point iteration corresponding to proposed method, and show that the proposed method is convergent. The proposed method produces better or comparable inception scores, and comparable quality of images compared to other recently proposed state-of-the-art second order methods. Compared to first order methods such as ADAM, it produces significantly better inception scores. The proposed method is compared and validated on popular datasets such as FFHQ, LSUN, CIFAR10, MNIST, and Fashion MNIST for image generation tasks\footnote{Accepted in IJCNN 2023}. Codes: \url{https://github.com/misterpawan/acom}

artificial intelligence, machine learning, optimization problem, (18 more...)

2304.10317

Country:

Asia > India > Telangana > Hyderabad (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Federated Compositional Deep AUC Maximization

Zhang, Xinwen, Zhang, Yihan, Yang, Tianbao, Souvenir, Richard, Gao, Hongchang

Federated learning has attracted increasing attention due to the promise of balancing privacy and large-scale learning; numerous approaches have been proposed. However, most existing approaches focus on problems with balanced data, and prediction performance is far from satisfactory for many real-world applications where the number of samples in different classes is highly imbalanced. To address this challenging problem, we developed a novel federated learning method for imbalanced data by directly optimizing the area under curve (AUC) score. In particular, we formulate the AUC maximization problem as a federated compositional minimax optimization problem, develop a local stochastic compositional gradient descent ascent with momentum algorithm, and provide bounds on the computational and communication complexities of our algorithm. To the best of our knowledge, this is the first work to achieve such favorable theoretical results. Finally, extensive experimental results confirm the efficacy of our method.

algorithm, artificial intelligence, machine learning, (17 more...)

2304.10101

Country: North America > United States > Texas (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.37)

Velasevic, Boris, Parasnis, Rohit, Brinton, Christopher G., Azizan, Navid

On the Effects of Data Heterogeneity on the Convergence Rates of Distributed Linear System Solvers

We consider the fundamental problem of solving a large-scale system of linear equations. In particular, we consider the setting where a taskmaster intends to solve the system in a distributed/federated fashion with the help of a set of machines, who each have a subset of the equations. Although there exist several approaches for solving this problem, missing is a rigorous comparison between the convergence rates of the projection-based methods and those of the optimization-based ones. In this paper, we analyze and compare these two classes of algorithms with a particular focus on the most efficient method from each class, namely, the recently proposed Accelerated Projection-Based Consensus (APC) and the Distributed Heavy-Ball Method (D-HBM). To this end, we first propose a geometric notion of data heterogeneity called angular heterogeneity and discuss its generality. Using this notion, we bound and compare the convergence rates of the studied algorithms and capture the effects of both cross-machine and local data heterogeneity on these quantities. Our analysis results in a number of novel insights besides showing that APC is the most efficient method in realistic scenarios where there is a large data heterogeneity. Our numerical analyses validate our theoretical results.

convergence rate, denote, heterogeneity, (16 more...)

2304.1064

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)

arXiv.org Machine LearningApr-20-2023

Projective Proximal Gradient Descent for A Class of Nonconvex Nonsmooth Optimization Problems: Fast Convergence Without Kurdyka-Lojasiewicz (KL) Property

Yang, Yingzhen, Li, Ping

Nonconvex and nonsmooth optimization problems are important and challenging for statistics and machine learning. In this paper, we propose Projected Proximal Gradient Descent (PPGD) which solves a class of nonconvex and nonsmooth optimization problems, where the nonconvexity and nonsmoothness come from a nonsmooth regularization term which is nonconvex but piecewise convex. In contrast with existing convergence analysis of accelerated PGD methods for nonconvex and nonsmooth problems based on the Kurdyka-\L{}ojasiewicz (K\L{}) property, we provide a new theoretical analysis showing local fast convergence of PPGD. It is proved that PPGD achieves a fast convergence rate of $\cO(1/k^2)$ when the iteration number $k \ge k_0$ for a finite $k_0$ on a class of nonconvex and nonsmooth problems under mild assumptions, which is locally Nesterov's optimal convergence rate of first-order methods on smooth and convex objective function with Lipschitz continuous gradient. Experimental results demonstrate the effectiveness of PPGD.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

2304.10499

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > Arizona > Maricopa County > Tempe (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Jeong, Halyun, Needell, Deanna

Linear Convergence of Reshuffling Kaczmarz Methods With Sparse Constraints

arXiv.org Machine LearningApr-20-2023

The Kaczmarz method (KZ) and its variants, which are types of stochastic gradient descent (SGD) methods, have been extensively studied due to their simplicity and efficiency in solving linear equation systems. The iterative thresholding (IHT) method has gained popularity in various research fields, including compressed sensing or sparse linear regression, machine learning with additional structure, and optimization with nonconvex constraints. Recently, a hybrid method called Kaczmarz-based IHT (KZIHT) has been proposed, combining the benefits of both approaches, but its theoretical guarantees are missing. In this paper, we provide the first theoretical convergence guarantees for KZIHT by showing that it converges linearly to the solution of a system with sparsity constraints up to optimal statistical bias when the reshuffling data sampling scheme is used. We also propose the Kaczmarz with periodic thresholding (KZPT) method, which generalizes KZIHT by applying the thresholding operation for every certain number of KZ iterations and by employing two different types of step sizes. We establish a linear convergence guarantee for KZPT for randomly subsampled bounded orthonormal systems (BOS) and mean-zero isotropic sub-Gaussian random matrices, which are most commonly used models in compressed sensing, dimension reduction, matrix sketching, and many inverse problems in neural networks. Our analysis shows that KZPT with an optimal thresholding period outperforms KZIHT. To support our theory, we include several numerical experiments.

artificial intelligence, kziht, machine learning, (16 more...)

arXiv.org Machine Learning

2304.10123

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Wang, Tianyu, Feng, Yasong

Convergence Rates of Stochastic Zeroth-order Gradient Descent for \L ojasiewicz Functions

arXiv.org Artificial IntelligenceApr-19-2023

Zeroth order optimization is a central topic in optimization and related fields. Algorithms for zeroth order optimization find important real-world applications, since often times in practice, we cannot directly access the derivatives of the objective function. To optimize the function in such scenarios, one can estimate the gradient/Hessian first and deploy first/second order algorithms with the estimated derivatives. Previously, many authors have considered this problem. Yet stochastic zeroth order methods for Łojasiewicz functions have not been carefully investigated (See Section 2 for more discussion).

artificial intelligence, converge, machine learning, (14 more...)

2210.16997

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)

Gentinetta, Gian, Sutter, David, Zoufal, Christa, Fuller, Bryce, Woerner, Stefan

Quantum Kernel Alignment with Stochastic Gradient Descent

arXiv.org Artificial IntelligenceApr-19-2023

Quantum support vector machines have the potential to achieve a quantum speedup for solving certain machine learning problems. The key challenge for doing so is finding good quantum kernels for a given data set -- a task called kernel alignment. In this paper we study this problem using the Pegasos algorithm, which is an algorithm that uses stochastic gradient descent to solve the support vector machine optimization problem. We extend Pegasos to the quantum case and and demonstrate its effectiveness for kernel alignment. Unlike previous work which performs kernel alignment by training a QSVM within an outer optimization loop, we show that using Pegasos it is possible to simultaneously train the support vector machine and align the kernel. Our experiments show that this approach is capable of aligning quantum feature maps with high accuracy, and outperforms existing quantum kernel alignment techniques. Specifically, we demonstrate that Pegasos is particularly effective for non-stationary data, which is an important challenge in real-world applications.

kernel alignment, pegaso, support vector machine, (13 more...)

doi: 10.1109/QCE57702.2023.00036

2304.09899

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

arXiv.org Artificial IntelligenceApr-18-2023

Particle-based Variational Inference with Preconditioned Functional Gradient Flow

Dong, Hanze, Wang, Xi, Lin, Yong, Zhang, Tong

Particle-based variational inference (VI) minimizes the KL divergence between model samples and the target posterior with gradient flow estimates. With the popularity of Stein variational gradient descent (SVGD), the focus of particle-based VI algorithms has been on the properties of functions in Reproducing Kernel Hilbert Space (RKHS) to approximate the gradient flow. However, the requirement of RKHS restricts the function class and algorithmic flexibility. This paper offers a general solution to this problem by introducing a functional regularization term that encompasses the RKHS norm as a special case. This allows us to propose a new particle-based VI algorithm called preconditioned functional gradient flow (PFG). Compared to SVGD, PFG has several advantages. It has a larger function class, improved scalability in large particle-size scenarios, better adaptation to ill-conditioned distributions, and provable continuous-time convergence in KL divergence. Additionally, non-linear function classes such as neural networks can be incorporated to estimate the gradient flow. Our theory and experiments demonstrate the effectiveness of the proposed framework.

algorithm, artificial intelligence, machine learning, (17 more...)

2211.13954

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)