AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

EnsLoss: Stochastic Calibrated Loss Ensembles for Preventing Overfitting in Classification

arXiv.org Machine LearningSep-3-2024

Empirical risk minimization (ERM) with a computationally feasible surrogate loss is a widely accepted approach for classification. Notably, the convexity and calibration (CC) properties of a loss function ensure consistency of ERM in maximizing accuracy, thereby offering a wide range of options for surrogate losses. In this article, we propose a novel ensemble method, namely EnsLoss, which extends the ensemble learning concept to combine loss functions within the ERM framework. A key feature of our method is the consideration on preserving the "legitimacy" of the combined losses, i.e., ensuring the CC properties. Specifically, we first transform the CC conditions of losses into loss-derivatives, thereby bypassing the need for explicit loss functions and directly generating calibrated loss-derivatives. Therefore, inspired by Dropout, EnsLoss enables loss ensembles through one training process with doubly stochastic gradient descent (i.e., random batch samples and random calibrated loss-derivatives). We theoretically establish the statistical consistency of our approach and provide insights into its benefits. The numerical effectiveness of EnsLoss compared to fixed loss methods is demonstrated through experiments on a broad range of 14 OpenML tabular datasets and 46 image datasets with various deep learning architectures. Python repository and source code are available on GitHub at https://github.com/statmlben/ensloss.

artificial intelligence, machine learning, stochastic calibrated loss ensemble, (3 more...)

arXiv.org Machine Learning

2409.00908

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)

Add feedback

Bootstrap SGD: Algorithmic Stability and Robustness

Christmann, Andreas, Lei, Yunwen

arXiv.org Machine LearningSep-2-2024

In this paper some methods to use the empirical bootstrap approach for stochastic gradient descent (SGD) to minimize the empirical risk over a separable Hilbert space are investigated from the view point of algorithmic stability and statistical robustness. The first two types of approaches are based on averages and are investigated from a theoretical point of view. A generalization analysis for bootstrap SGD of Type 1 and Type 2 based on algorithmic stability is done. Another type of bootstrap SGD is proposed to demonstrate that it is possible to construct purely distribution-free pointwise confidence intervals of the median curve using bootstrap SGD.

bootstrap sample, bootstrap sgd, stability, (14 more...)

arXiv.org Machine Learning

2409.01074

Country:

North America > United States > New York (0.04)
Europe > Germany > Bavaria > Upper Franconia > Bayreuth (0.04)
Asia > China > Hong Kong (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Compatible Gradient Approximations for Actor-Critic Algorithms

Saglam, Baturay, Kalogerias, Dionysis

arXiv.org Artificial IntelligenceSep-2-2024

Deterministic policy gradient algorithms are foundational for actor-critic methods in controlling continuous systems, yet they often encounter inaccuracies due to their dependence on the derivative of the critic's value estimates with respect to input actions. This reliance requires precise action-value gradient computations, a task that proves challenging under function approximation. We introduce an actor-critic algorithm that bypasses the need for such precision by employing a zeroth-order approximation of the action-value gradient through two-point stochastic gradient estimation within the action space. This approach provably and effectively addresses compatibility issues inherent in deterministic policy gradient schemes. Empirical results further demonstrate that our algorithm not only matches but frequently exceeds the performance of current state-of-the-art methods.

algorithm, approximation, gradient, (15 more...)

arXiv.org Artificial Intelligence

2409.01477

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Improving Adaptivity via Over-Parameterization in Sequence Models

Li, Yicheng, Lin, Qian

arXiv.org Machine LearningSep-1-2024

It is well known that eigenfunctions of a kernel play a crucial role in kernel regression. Through several examples, we demonstrate that even with the same set of eigenfunctions, the order of these functions significantly impacts regression outcomes. Simplifying the model by diagonalizing the kernel, we introduce an over-parameterized gradient descent in the realm of sequence model to capture the effects of various orders of a fixed set of eigen-functions. This method is designed to explore the impact of varying eigenfunction orders. Our theoretical results show that the over-parameterization gradient flow can adapt to the underlying structure of the signal and significantly outperform the vanilla gradient flow method. Moreover, we also demonstrate that deeper over-parameterization can further enhance the generalization capability of the model. These results not only provide a new perspective on the benefits of over-parameterization and but also offer insights into the adaptivity and generalization potential of neural networks beyond the kernel regime.

eigenvalue, gradient descent, neural network, (16 more...)

arXiv.org Machine Learning

2409.00894

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > California (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Add feedback

Towards Hyper-parameter-free Federated Learning

Geetika, null, Uniyal, Drishya, Chatterjee, Bapi

arXiv.org Artificial IntelligenceAug-30-2024

The adaptive synchronization techniques in federated learning (FL) for scaled global model updates show superior performance over the vanilla federated averaging (FedAvg) scheme. However, existing methods employ additional tunable hyperparameters on the server to determine the scaling factor. A contrasting approach is automated scaling analogous to tuning-free step-size schemes in stochastic gradient descent (SGD) methods, which offer competitive convergence rates and exhibit good empirical performance. In this work, we introduce two algorithms for automated scaling of global model updates. In our first algorithm, we establish that a descent-ensuring step-size regime at the clients ensures descent for the server objective. We show that such a scheme enables linear convergence for strongly convex federated objectives. Our second algorithm shows that the average of objective values of sampled clients is a practical and effective substitute for the objective function value at the server required for computing the scaling factor, whose computation is otherwise not permitted. Our extensive empirical results show that the proposed methods perform at par or better than the popular federated learning algorithms for both convex and non-convex problems. Our work takes a step towards designing hyper-parameter-free federated learning.

algorithm, hyper-parameter-free federated learning, objective, (13 more...)

arXiv.org Artificial Intelligence

2408.17145

Country:

Asia > India > NCT > New Delhi (0.04)
Asia > India > NCT > Delhi (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Government (0.46)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

High-Dimensional Sparse Data Low-rank Representation via Accelerated Asynchronous Parallel Stochastic Gradient Descent

Hu, Qicong, Wu, Hao

arXiv.org Artificial IntelligenceAug-29-2024

Data characterized by high dimensionality and sparsity are commonly used to describe real-world node interactions. Low-rank representation (LR) can map high-dimensional sparse (HDS) data to low-dimensional feature spaces and infer node interactions via modeling data latent associations. Unfortunately, existing optimization algorithms for LR models are computationally inefficient and slowly convergent on large-scale datasets. To address this issue, this paper proposes an Accelerated Asynchronous Parallel Stochastic Gradient Descent A2PSGD for High-Dimensional Sparse Data Low-rank Representation with three fold-ideas: a) establishing a lock-free scheduler to simultaneously respond to scheduling requests from multiple threads; b) introducing a greedy algorithm-based load balancing strategy for balancing the computational load among threads; c) incorporating Nesterov's accelerated gradient into the learning scheme to accelerate model convergence. Empirical studies show that A2PSGD outperforms existing optimization algorithms for HDS data LR in both accuracy and training time.

factorization, ieee transaction, lr model, (13 more...)

arXiv.org Artificial Intelligence

2408.16592

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > China > Beijing > Beijing (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(13 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Energy (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

A Minibatch-SGD-Based Learning Meta-Policy for Inventory Systems with Myopic Optimal Policy

Lyu, Jiameng, Xie, Jinxing, Yuan, Shilin, Zhou, Yuan

arXiv.org Artificial IntelligenceAug-28-2024

Stochastic gradient descent (SGD) has proven effective in solving many inventory control problems with demand learning. However, it often faces the pitfall of an infeasible target inventory level that is lower than the current inventory level. Several recent works (e.g., Huh and Rusmevichientong (2009), Shi et al.(2016)) are successful to resolve this issue in various inventory systems. However, their techniques are rather sophisticated and difficult to be applied to more complicated scenarios such as multi-product and multi-constraint inventory systems. In this paper, we address the infeasible-target-inventory-level issue from a new technical perspective -- we propose a novel minibatch-SGD-based meta-policy. Our meta-policy is flexible enough to be applied to a general inventory systems framework covering a wide range of inventory management problems with myopic clairvoyant optimal policy. By devising the optimal minibatch scheme, our meta-policy achieves a regret bound of $\mathcal{O}(\sqrt{T})$ for the general convex case and $\mathcal{O}(\log T)$ for the strongly convex case. To demonstrate the power and flexibility of our meta-policy, we apply it to three important inventory control problems: multi-product and multi-constraint systems, multi-echelon serial systems, and one-warehouse and multi-store systems by carefully designing application-specific subroutines.We also conduct extensive numerical experiments to demonstrate that our meta-policy enjoys competitive regret performance, high computational efficiency, and low variances among a wide range of applications.

algorithm, inventory system, minibatch-sgd-based meta-policy, (13 more...)

arXiv.org Artificial Intelligence

2408.16181

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

Single-Loop Deterministic and Stochastic Interior-Point Algorithms for Nonlinearly Constrained Optimization

Curtis, Frank E., Jiang, Xin, Wang, Qi

arXiv.org Artificial IntelligenceAug-28-2024

An interior-point algorithm framework is proposed, analyzed, and tested for solving nonlinearly constrained continuous optimization problems. The main setting of interest is when the objective and constraint functions may be nonlinear and/or nonconvex, and when constraint values and derivatives are tractable to compute, but objective function values and derivatives can only be estimated. The algorithm is intended primarily for a setting that is similar for stochastic-gradient methods for unconstrained optimization, namely, the setting when stochastic-gradient estimates are available and employed in place of gradients of the objective, and when no objective function values (nor estimates of them) are employed. This is achieved by the interior-point framework having a single-loop structure rather than the nested-loop structure that is typical of contemporary interior-point methods. For completeness, convergence guarantees for the framework are provided both for deterministic and stochastic settings. Numerical experiments show that the algorithm yields good performance on a large set of test problems.

algorithm, algorithm 1, constraint, (14 more...)

arXiv.org Artificial Intelligence

2408.16186

Country: North America > United States > New York (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Effect of Adaptation Rate and Cost Display in a Human-AI Interaction Game

Isa, Jason T., Wu, Bohan, Wang, Qirui, Zhang, Yilin, Burden, Samuel A., Ratliff, Lillian J., Chasnov, Benjamin J.

arXiv.org Artificial IntelligenceAug-26-2024

As interactions between humans and AI become more prevalent, it is critical to have better predictors of human behavior in these interactions. We investigated how changes in the AI's adaptive algorithm impact behavior predictions in two-player continuous games. In our experiments, the AI adapted its actions using a gradient descent algorithm under different adaptation rates while human participants were provided cost feedback. The cost feedback was provided by one of two types of visual displays: (a) cost at the current joint action vector, or (b) cost in a local neighborhood of the current joint action vector. Our results demonstrate that AI adaptation rate can significantly affect human behavior, having the ability to shift the outcome between two game theoretic equilibrium. We observed that slow adaptation rates shift the outcome towards the Nash equilibrium, while fast rates shift the outcome towards the human-led Stackelberg equilibrium. The addition of localized cost information had the effect of shifting outcomes towards Nash, compared to the outcomes from cost information at only the current joint action vector. Future work will investigate other effects that influence the convergence of gradient descent games.

adaptation rate, equilibrium, participant, (15 more...)

arXiv.org Artificial Intelligence

2408.1464

Country:

North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Malaysia (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Decision-Focused Learning to Predict Action Costs for Planning

Mandi, Jayanta, Foschini, Marco, Holler, Daniel, Thiebaux, Sylvie, Hoffmann, Jorg, Guns, Tias

arXiv.org Artificial IntelligenceAug-26-2024

In many automated planning applications, action costs can be hard to specify. An example is the time needed to travel through a certain road segment, which depends on many factors, such as the current weather conditions. A natural way to address this issue is to learn to predict these parameters based on input features (e.g., weather forecasts) and use the predicted action costs in automated planning afterward. Decision-Focused Learning (DFL) has been successful in learning to predict the parameters of combinatorial optimization problems in a way that optimizes solution quality rather than prediction quality. This approach yields better results than treating prediction and optimization as separate tasks. In this paper, we investigate for the first time the challenges of implementing DFL for automated planning in order to learn to predict the action costs. There are two main challenges to overcome: (1) planning systems are called during gradient descent learning, to solve planning problems with negative action costs, which are not supported in planning. We propose novel methods for gradient computation to avoid this issue. (2) DFL requires repeated planner calls during training, which can limit the scalability of the method. We experiment with different methods approximating the optimal plan as well as an easy-to-implement caching mechanism to speed up the learning process. As the first work that addresses DFL for automated planning, we demonstrate that the proposed gradient computation consistently yields significantly better plans than predictions aimed at minimizing prediction error; and that caching can temper the computation requirements.

action cost, planning problem, prediction, (15 more...)

arXiv.org Artificial Intelligence

2408.06876

Country:

Europe > Germany > Saarland (0.05)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Oceania > Australia (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry: Transportation (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback