AITopics | exponential weight

Fast Routing under Uncertainty: Adaptive Learning in Congestion Games via Exponential Weights

Neural Information Processing SystemsDec-24-2025, 08:38:46 GMT

We examine an adaptive learning framework for nonatomic congestion games where the players' cost functions may be subject to exogenous fluctuations (e.g., due to disturbances in the network, variations in the traffic going through a link). In this setting, the popular multiplicative/ exponential weights algorithm enjoys an $\mathcal{O}(1/\sqrt{T})$ equilibrium convergence rate; however, this rate is suboptimal in static environments---i.e., when the network is not subject to randomness. In this static regime, accelerated algorithms achieve an $\mathcal{O}(1/T^{2})$ convergence speed, but they fail to converge altogether in stochastic problems. To fill this gap, we propose a novel, adaptive exponential weights method---dubbed AdaWeight---that seamlessly interpolates between the $\mathcal{O}(1/T^{2})$ and $\mathcal{O}(1/\sqrt{T})$ rates in the static and stochastic regimes respectively. Importantly, this best-of-both-worlds guarantee does not require any prior knowledge of the problem's parameters or tuning by the optimizer; in addition, the method's convergence speed depends subquadratically on the size of the network (number of vertices and edges), so it scales gracefully to large, real-life urban networks.

adaptive learning, congestion game, fast routing, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Learning the Learning Rate for Prediction with Expert Advice

Neural Information Processing SystemsOct-2-2025, 22:06:02 GMT

Most standard algorithms for prediction with expert advice depend on a parameter called the learning rate. This learning rate needs to be large enough to fit the data well, but small enough to prevent overfitting. For the exponential weights algorithm, a sequence of prior work has established theoretical guarantees for higher and higher data-dependent tunings of the learning rate, which allow for increasingly aggressive learning. But in practice such theoretical tunings often still perform worse (as measured by their regret) than ad hoc tuning with an even higher learning rate. To close the gap between theory and practice we introduce an approach to learn the learning rate. Up to a factor that is at most (poly)logarithmic in the number of experts and the inverse of the learning rate, our method performs as well as if we would know the empirically best learning rate from a large range that includes both conservative small values and values that are much higher than those for which formal guarantees were previously available. Our method employs a grid of learning rates, yet runs in linear time regardless of the size of the grid.

algorithm, cumulative mixability gap, mixability gap, (14 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > South Holland > Leiden (0.04)
Oceania > Australia > Queensland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning the Learning Rate for Prediction with Expert Advice

Neural Information Processing SystemsFeb-9-2025, 00:59:07 GMT

Most standard algorithms for prediction with expert advice depend on a parameter called the learning rate. This learning rate needs to be large enough to fit the data well, but small enough to prevent overfitting. For the exponential weights algorithm, a sequence of prior work has established theoretical guarantees for higher and higher data-dependent tunings of the learning rate, which allow for increasingly aggressive learning. But in practice such theoretical tunings often still perform worse (as measured by their regret) than ad hoc tuning with an even higher learning rate. To close the gap between theory and practice we introduce an approach to learn the learning rate. Up to a factor that is at most (poly)logarithmic in the number of experts and the inverse of the learning rate, our method performs as well as if we would know the empirically best learning rate from a large range that includes both conservative small values and values that are much higher than those for which formal guarantees were previously available. Our method employs a grid of learning rates, yet runs in linear time regardless of the size of the grid.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > South Holland > Leiden (0.04)
Oceania > Australia > Queensland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fast Routing under Uncertainty: Adaptive Learning in Congestion Games via Exponential Weights

Neural Information Processing SystemsOct-11-2024, 09:10:05 GMT

We examine an adaptive learning framework for nonatomic congestion games where the players' cost functions may be subject to exogenous fluctuations (e.g., due to disturbances in the network, variations in the traffic going through a link). In this setting, the popular multiplicative/ exponential weights algorithm enjoys an \mathcal{O}(1/\sqrt{T}) equilibrium convergence rate; however, this rate is suboptimal in static environments---i.e., when the network is not subject to randomness. In this static regime, accelerated algorithms achieve an \mathcal{O}(1/T {2}) convergence speed, but they fail to converge altogether in stochastic problems. To fill this gap, we propose a novel, adaptive exponential weights method---dubbed AdaWeight---that seamlessly interpolates between the \mathcal{O}(1/T {2}) and \mathcal{O}(1/\sqrt{T}) rates in the static and stochastic regimes respectively. Importantly, this "best-of-both-worlds" guarantee does not require any prior knowledge of the problem's parameters or tuning by the optimizer; in addition, the method's convergence speed depends subquadratically on the size of the network (number of vertices and edges), so it scales gracefully to large, real-life urban networks.

adaptive learning, congestion game, exponential weight, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.79)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.40)

Add feedback

Optimization, Learning, and Games with Predictable Sequences

Neural Information Processing SystemsMar-13-2024, 23:18:25 GMT

We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to Hölder-smooth functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror Descent (which has a close relation to the Exponential Weights algorithm) can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T) T).

algorithm, optimization, sequence, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.34)

Add feedback

Learning the Learning Rate for Prediction with Expert Advice

Neural Information Processing SystemsMar-13-2024, 07:47:52 GMT

Most standard algorithms for prediction with expert advice depend on a parameter called the learning rate. This learning rate needs to be large enough to fit the data well, but small enough to prevent overfitting. For the exponential weights algorithm, a sequence of prior work has established theoretical guarantees for higher and higher data-dependent tunings of the learning rate, which allow for increasingly aggressive learning. But in practice such theoretical tunings often still perform worse (as measured by their regret) than ad hoc tuning with an even higher learning rate. To close the gap between theory and practice we introduce an approach to learn the learning rate. Up to a factor that is at most (poly)logarithmic in the number of experts and the inverse of the learning rate, our method performs as well as if we would know the empirically best learning rate from a large range that includes both conservative small values and values that are much higher than those for which formal guarantees were previously available. Our method employs a grid of learning rates, yet runs in linear time regardless of the size of the grid.

algorithm, cumulative mixability gap, mixability gap, (14 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > South Holland > Leiden (0.04)
Oceania > Australia > Queensland (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Exponential weight averaging as damped harmonic motion

Patsenker, Jonathan, Li, Henry, Kluger, Yuval

arXiv.org Artificial IntelligenceOct-20-2023

The exponential moving average (EMA) is a commonly used statistic for providing stable estimates of stochastic quantities in deep learning optimization. Recently, EMA has seen considerable use in generative models, where it is computed with respect to the model weights, and significantly improves the stability of the inference model during and after training. While the practice of weight averaging at the end of training is well-studied and known to improve estimates of local optima, the benefits of EMA over the course of training is less understood. In this paper, we derive an explicit connection between EMA and a damped harmonic system between two particles, where one particle (the EMA weights) is drawn to the other (the model weights) via an idealized zero-length spring. We then leverage this physical analogy to analyze the effectiveness of EMA, and propose an improved training algorithm, which we call BELAY. Finally, we demonstrate theoretically and empirically several advantages enjoyed by BELAY over standard EMA.

exponential weight, harmonic motion

arXiv.org Artificial Intelligence

2310.13854

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Boosting Nystr\"{o}m Method

Hamm, Keaton, Lu, Zhaoying, Ouyang, Wenbo, Zhang, Hao Helen

arXiv.org Artificial IntelligenceFeb-21-2023

The Nystr\"{o}m method is an effective tool to generate low-rank approximations of large matrices, and it is particularly useful for kernel-based learning. To improve the standard Nystr\"{o}m approximation, ensemble Nystr\"{o}m algorithms compute a mixture of Nystr\"{o}m approximations which are generated independently based on column resampling. We propose a new family of algorithms, boosting Nystr\"{o}m, which iteratively generate multiple ``weak'' Nystr\"{o}m approximations (each using a small number of columns) in a sequence adaptively - each approximation aims to compensate for the weaknesses of its predecessor - and then combine them to form one strong approximation. We demonstrate that our boosting Nystr\"{o}m algorithms can yield more efficient and accurate low-rank approximations to kernel matrices. Improvements over the standard and ensemble Nystr\"{o}m methods are illustrated by simulation studies and real-world data analysis.

approximation, artificial intelligence, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2302.11032

Country:

North America > United States > Arizona > Pima County > Tucson (0.14)
North America > United States > Texas > Tarrant County > Arlington (0.04)
North America > United States > Colorado (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

WildWood: a new Random Forest algorithm

Gaïffas, Stéphane, Merad, Ibrahim, Yu, Yiyang

arXiv.org Machine LearningSep-16-2021

We introduce WildWood (WW), a new ensemble algorithm for supervised learning of Random Forest (RF) type. While standard RF algorithms use bootstrap out-of-bag samples to compute out-of-bag scores, WW uses these samples to produce improved predictions given by an aggregation of the predictions of all possible subtrees of each fully grown tree in the forest. This is achieved by aggregation with exponential weights computed over out-of-bag samples, that are computed exactly and very efficiently thanks to an algorithm called context tree weighting. This improvement, combined with a histogram strategy to accelerate split finding, makes WW fast and competitive compared with other well-established ensemble methods, such as standard RF and extreme gradient boosting algorithms.

algorithm, node, prediction, (16 more...)

arXiv.org Machine Learning

2109.0801

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Exploration by Optimisation in Partial Monitoring

Lattimore, Tor, Szepesvari, Csaba

arXiv.org Machine LearningJul-12-2019

We provide a simple and efficient algorithm for adversarial $k$-action $d$-outcome non-degenerate locally observable partial monitoring games for which the $n$-round minimax regret is bounded by $3(d+1) k^{3/2} \sqrt{8n \log(k)}$, matching the best known information-theoretic upper bounds.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1907.05772

Country: