AITopics

2408.02839

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
Europe > Greece (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.90)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningAug-4-2024

Sparks of Quantum Advantage and Rapid Retraining in Machine Learning

Troy, William

The advent of quantum computing holds the potential to revolutionize various fields by solving complex problems more efficiently than classical computers. Despite this promise, practical quantum advantage is hindered by current hardware limitations, notably the small number of qubits and high noise levels. In this study, we leverage adiabatic quantum computers to optimize Kolmogorov-Arnold Networks, a powerful neural network architecture for representing complex functions with minimal parameters. By modifying the network to use Bezier curves as the basis functions and formulating the optimization problem into a Quadratic Unconstrained Binary Optimization problem, we create a fixed-sized solution space, independent of the number of training samples. This strategy allows for the optimization of an entire neural network in a single training iteration in which, due to order of operations, a majority of the processing is done using a collapsed version of the training dataset. This inherently creates extremely fast training speeds, which are validated experimentally, compared to classical optimizers including Adam, Stochastic Gradient Descent, Adaptive Gradient, and simulated annealing. Additionally, we introduce a novel rapid retraining capability, enabling the network to be retrained with new data without reprocessing old samples, thus enhancing learning efficiency in dynamic environments. Experiments on retraining demonstrate a hundred times speed up using adiabatic quantum computing based optimization compared to that of the gradient descent based optimizers, with theoretical models allowing this speed up to be much larger! Our findings suggest that with further advancements in quantum hardware and algorithm optimization, quantum-optimized machine learning models could have broad applications across various domains, with initial focus on rapid retraining.

optimization, quantum annealer, zier curve, (14 more...)

2407.1602

Country: Europe > Switzerland (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Zekri, Oussama, Benechehab, Abdelhakim, Redko, Ievgen

Can LLMs predict the convergence of Stochastic Gradient Descent?

arXiv.org Machine LearningAug-3-2024

Large-language models are notoriously famous for their impressive performance across a wide range of tasks. One surprising example of such impressive performance is a recently identified capacity of LLMs to understand the governing principles of dynamical systems satisfying the Markovian property. In this paper, we seek to explore this direction further by studying the dynamics of stochastic gradient descent in convex and non-convex optimization. By leveraging the theoretical link between the SGD and Markov chains, we show a remarkable zero-shot performance of LLMs in predicting the local minima to which SGD converges for previously unseen starting points. On a more general level, we inquire about the possibility of using LLMs to perform zero-shot randomized trials for larger deep learning models used in practice.

convergence, markov chain, transition kernel, (9 more...)

2408.01736

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Nevada (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

arXiv.org Artificial IntelligenceAug-2-2024

Optimizing Variational Quantum Circuits Using Metaheuristic Strategies in Reinforcement Learning

Kölle, Michael, Seidl, Daniel, Zorn, Maximilian, Altmann, Philipp, Stein, Jonas, Gabor, Thomas

Quantum Reinforcement Learning (QRL) offers potential advantages over classical Reinforcement Learning, such as compact state space representation and faster convergence in certain scenarios. However, practical benefits require further validation. QRL faces challenges like flat solution landscapes, where traditional gradient-based methods are inefficient, necessitating the use of gradient-free algorithms. This work explores the integration of metaheuristic algorithms -- Particle Swarm Optimization, Ant Colony Optimization, Tabu Search, Genetic Algorithm, Simulated Annealing, and Harmony Search -- into QRL. These algorithms provide flexibility and efficiency in parameter optimization. Evaluations in $5\times5$ MiniGrid Reinforcement Learning environments show that, all algorithms yield near-optimal results, with Simulated Annealing and Particle Swarm Optimization performing best. In the Cart Pole environment, Simulated Annealing, Genetic Algorithms, and Particle Swarm Optimization achieve optimal results, while the others perform slightly better than random action selection. These findings demonstrate the potential of Particle Swarm Optimization and Simulated Annealing for efficient QRL learning, emphasizing the need for careful algorithm selection and adaptation.

algorithm, arctan, cart pole environment, (14 more...)

2408.01187

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.07)
North America > United States > Michigan (0.04)
Europe > France (0.04)
Asia > Middle East > Republic of Türkiye (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Hsiao, Yen-Che, Dutta, Abhishek

Hybrid Coordinate Descent for Efficient Neural Network Learning Using Line Search and Gradient Descent

arXiv.org Artificial IntelligenceAug-2-2024

This paper presents a novel coordinate descent algorithm leveraging a combination of one-directional line search and gradient information for parameter updates for a squared error loss function. Each parameter undergoes updates determined by either the line search or gradient method, contingent upon whether the modulus of the gradient of the loss with respect to that parameter surpasses a predefined threshold. Notably, a larger threshold value enhances algorithmic efficiency. Despite the potentially slower nature of the line search method relative to gradient descent, its parallelizability facilitates computational time reduction. Experimental validation conducted on a 2-layer Rectified Linear Unit network with synthetic data elucidates the impact of hyperparameters on convergence rates and computational efficiency.

coordinate descent, descent, neural network, (10 more...)

2408.01374

Country: North America > United States > Connecticut > Tolland County > Storrs (0.15)

Genre: Research Report > New Finding (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

arXiv.org Artificial IntelligenceJul-31-2024

Convergence rates of stochastic gradient method with independent sequences of step-size and momentum weight

Hwang, Wen-Liang

In large-scale learning algorithms, the momentum term is usually included in the stochastic sub-gradient method to improve the learning speed because it can navigate ravines efficiently to reach a local minimum. However, step-size and momentum weight hyper-parameters must be appropriately tuned to optimize convergence. We thus analyze the convergence rate using stochastic programming with Polyak's acceleration of two commonly used step-size learning rates: ``diminishing-to-zero" and ``constant-and-drop" (where the sequence is divided into stages and a constant step-size is applied at each stage) under strongly convex functions over a compact convex set with bounded sub-gradients. For the former, we show that the convergence rate can be written as a product of exponential in step-size and polynomial in momentum weight. Our analysis justifies the convergence of using the default momentum weight setting and the diminishing-to-zero step-size sequence in large-scale machine learning software. For the latter, we present the condition for the momentum weight sequence to converge at each stage.

independent sequence, step-size and momentum weight, stochastic gradient method, (1 more...)

2408.02678

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

arXiv.org Artificial IntelligenceJul-30-2024

An Effective Dynamic Gradient Calibration Method for Continual Learning

Lin, Weichen, Chen, Jiaxiang, Huang, Ruomin, Ding, Hu

Continual learning (CL) is a fundamental topic in machine learning, where the goal is to train a model with continuously incoming data and tasks. Due to the memory limit, we cannot store all the historical data, and therefore confront the ``catastrophic forgetting'' problem, i.e., the performance on the previous tasks can substantially decrease because of the missing information in the latter period. Though a number of elegant methods have been proposed, the catastrophic forgetting phenomenon still cannot be well avoided in practice. In this paper, we study the problem from the gradient perspective, where our aim is to develop an effective algorithm to calibrate the gradient in each updating step of the model; namely, our goal is to guide the model to be updated in the right direction under the situation that a large amount of historical data are unavailable. Our idea is partly inspired by the seminal stochastic variance reduction methods (e.g., SVRG and SAGA) for reducing the variance of gradient estimation in stochastic gradient descent algorithms. Another benefit is that our approach can be used as a general tool, which is able to be incorporated with several existing popular CL methods to achieve better performance. We also conduct a set of experiments on several benchmark datasets to evaluate the performance in practice.

continual learning, dgc, effective dynamic gradient calibration method, (10 more...)

2407.20956

Country:

Asia > China > Anhui Province (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Dereich, Steffen, Jentzen, Arnulf

Convergence rates for the Adam optimizer

arXiv.org Machine LearningJul-29-2024

Stochastic gradient descent (SGD) optimization methods are nowadays the method of choice for the training of deep neural networks (DNNs) in artificial intelligence systems. In practically relevant training problems, usually not the plain vanilla standard SGD method is the employed optimization scheme but instead suitably accelerated and adaptive SGD optimization methods are applied. As of today, maybe the most popular variant of such accelerated and adaptive SGD optimization methods is the famous Adam optimizer proposed by Kingma & Ba in 2014. Despite the popularity of the Adam optimizer in implementations, it remained an open problem of research to provide a convergence analysis for the Adam optimizer even in the situation of simple quadratic stochastic optimization problems where the objective function (the function one intends to minimize) is strongly convex. In this work we solve this problem by establishing optimal convergence rates for the Adam optimizer for a large class of stochastic optimization problems, in particular, covering simple quadratic stochastic optimization problems. The key ingredient of our convergence analysis is a new vector field function which we propose to refer to as the Adam vector field. This Adam vector field accurately describes the macroscopic behaviour of the Adam optimization process but differs from the negative gradient of the objective function (the function we intend to minimize) of the considered stochastic optimization problem. In particular, our convergence analysis reveals that the Adam optimizer does typically not converge to critical points of the objective function (zeros of the gradient of the objective function) of the considered optimization problem but converges with rates to zeros of this Adam vector field.

adam optimizer, convergence rate

2407.21078

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)

Waghela, Hetvi, Sen, Jaydip, Rakshit, Sneha

Enhancing Adversarial Text Attacks on BERT Models with Projected Gradient Descent

arXiv.org Artificial IntelligenceJul-29-2024

Adversarial attacks against deep learning models represent a major threat to the security and reliability of natural language processing (NLP) systems. In this paper, we propose a modification to the BERT-Attack framework, integrating Projected Gradient Descent (PGD) to enhance its effectiveness and robustness. The original BERT-Attack, designed for generating adversarial examples against BERT-based models, suffers from limitations such as a fixed perturbation budget and a lack of consideration for semantic similarity. The proposed approach in this work, PGD-BERT-Attack, addresses these limitations by leveraging PGD to iteratively generate adversarial examples while ensuring both imperceptibility and semantic similarity to the original input. Extensive experiments are conducted to evaluate the performance of PGD-BERT-Attack compared to the original BERT-Attack and other baseline methods. The results demonstrate that PGD-BERT-Attack achieves higher success rates in causing misclassification while maintaining low perceptual changes. Furthermore, PGD-BERT-Attack produces adversarial instances that exhibit greater semantic resemblance to the initial input, enhancing their applicability in real-world scenarios. Overall, the proposed modification offers a more effective and robust approach to adversarial attacks on BERT-based models, thus contributing to the advancement of defense against attacks on NLP systems.

adversarial example, bert-attack, perturbation, (16 more...)

2407.21073

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > India > West Bengal > Kolkata (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(6 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Follain, Bertille, Bach, Francis

Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods

arXiv.org Machine LearningJul-24-2024

We propose a new method for feature learning and function estimation in supervised learning via regularised empirical risk minimisation. Our approach considers functions as expectations of Sobolev functions over all possible one-dimensional projections of the data. This framework is similar to kernel ridge regression, where the kernel is $\mathbb{E}_w ( k^{(B)}(w^\top x,w^\top x^\prime))$, with $k^{(B)}(a,b) := \min(|a|, |b|)1_{ab>0}$ the Brownian kernel, and the distribution of the projections $w$ is learnt. This can also be viewed as an infinite-width one-hidden layer neural network, optimising the first layer's weights through gradient descent and explicitly adjusting the non-linearity and weights of the second layer. We introduce an efficient computation method for the estimator, called Brownian Kernel Neural Network (BKerNN), using particles to approximate the expectation. The optimisation is principled due to the positive homogeneity of the Brownian kernel. Using Rademacher complexity, we show that BKerNN's expected risk converges to the minimal risk with explicit high-probability rates of $O( \min((d/n)^{1/2}, n^{-1/6}))$ (up to logarithmic factors). Numerical experiments confirm our optimisation intuitions, and BKerNN outperforms kernel ridge regression, and favourably compares to a one-hidden layer neural network with ReLU activations in various settings and real data sets.

bkernn, neural network, penalty, (12 more...)

2407.1728

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)