kumar
- Health & Medicine (0.46)
- Education (0.34)
- North America > United States (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > United States > California > Alameda County > Oakland (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- North America > United States (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Brazos County > College Station (0.14)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
26657d5ff9020d2abefe558796b99584-Paper.pdf
Specifically, there now exists a tight relaxation for verifying therobustness ofaneural networkto` input perturbations, aswell asefficient primal and dual solvers for the relaxation. Buoyed by this success, we consider the problem of developing similar techniques for verifying robustness to input perturbations within the probability simplex. We prove a somewhat surprising result that,inthiscase, notonlycanonedesign atightrelaxation thatovercomes the convexbarrier,butthe size ofthe relaxation remains linear inthe number of neurons, thereby leading tosimpler and more efficient algorithms.
OfflineReinforcementLearningasOneBig SequenceModelingProblem
Reinforcement learning (RL) is typically concerned with estimating stationary policies orsingle-step models, leveraging theMarkovproperty tofactorize problems in time. However, we can also view RL as a generic sequence modeling problem, with the goal being to produce a sequence of actions that leads to a sequence ofhighrewards.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum
Kumar, Navdeep, Dahan, Tehila, Cohen, Lior, Barua, Ananyabrata, Ramponi, Giorgia, Levy, Kfir Yehuda, Mannor, Shie
We establish an optimal sample complexity of $O(ε^{-2})$ for obtaining an $ε$-optimal global policy using a single-timescale actor-critic (AC) algorithm in infinite-horizon discounted Markov decision processes (MDPs) with finite state-action spaces, improving upon the prior state of the art of $O(ε^{-3})$. Our approach applies STORM (STOchastic Recursive Momentum) to reduce variance in the critic updates. However, because samples are drawn from a nonstationary occupancy measure induced by the evolving policy, variance reduction via STORM alone is insufficient. To address this challenge, we maintain a buffer of small fraction of recent samples and uniformly sample from it for each critic update. Importantly, these mechanisms are compatible with existing deep learning architectures and require only minor modifications, without compromising practical applicability.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)