AITopics

2504.04579

Country:

Europe (0.28)
Asia (0.28)
North America > United States (0.27)

Genre: Research Report > New Finding (0.87)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

arXiv.org Machine LearningFeb-16-2025

Convergence of Policy Mirror Descent Beyond Compatible Function Approximation

Sherman, Uri, Koren, Tomer, Mansour, Yishay

Modern policy optimization methods roughly follow the policy mirror descent (PMD) algorithmic template, for which there are by now numerous theoretical convergence results. However, most of these either target tabular environments, or can be applied effectively only when the class of policies being optimized over satisfies strong closure conditions, which is typically not the case when working with parametric policy classes in large-scale environments. In this work, we develop a theoretical framework for PMD for general policy classes where we replace the closure conditions with a strictly weaker variational gradient dominance assumption, and obtain upper bounds on the rate of convergence to the best-in-class policy. Our main result leverages a novel notion of smoothness with respect to a local norm induced by the occupancy measure of the current policy, and casts PMD as a particular instance of smooth non-convex optimization in non-Euclidean space.

artificial intelligence, machine learning, policy class, (18 more...)

2502.11033

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.40)

arXiv.org Artificial IntelligenceFeb-15-2024

Rate-Optimal Policy Optimization for Linear Markov Decision Processes

Sherman, Uri, Cohen, Alon, Koren, Tomer, Mansour, Yishay

Policy Optimization (PO) algorithms are a class of methods in Reinforcement Learning(RL; Sutton and Barto, 2018; Mannor et al., 2022) where the agent's policy is iteratively updated according to the (possibly preconditioned) gradient of the value function w.r.t.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2308.14642

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

arXiv.org Artificial IntelligenceJan-22-2024

The Dimension Strikes Back with Gradients: Generalization of Gradient Methods in Stochastic Convex Optimization

Schliserman, Matan, Sherman, Uri, Koren, Tomer

The study of generalization properties of stochastic optimization algorithms has been at the heart of contemporary machine learning research. While in the more classical frameworks studies largely focused on the learning problem (e.g., Alon et al., 1997; Blumer et al., 1989), in the past decade it has become clear that in modern scenarios the particular algorithm used to learn the model plays a vital role in its generalization performance. As a prominent example, heavily over-parameterized deep neural networks trained by first order methods output models that generalize well, despite the fact that an arbitrarily chosen Empirical Risk Minimizer (ERM) may perform poorly (Zhang et al., 2017; Neyshabur et al., 2014, 2017). The present paper aims at understanding the generalization behavior of gradient methods, specifically in connection with the problem dimension, in the fundamental Stochastic Convex Optimization (SCO) learning setup; a well studied, theoretical framework widely used to study stochastic optimization algorithms. The seminal work of Shalev-Shwartz et al. (2010) was the first to show that uniform convergence, the canonical condition for generalization in statistical learning (e.g., Vapnik, 1971; Bartlett and Mendelson, 2002) may not hold in high-dimensional SCO: they demonstrated learning problems where there exist certain ERMs that overfit the training data (i.e., exhibit large population risk), while models produced by e.g., Stochastic Gradient Descent (SGD) or regularized empirical risk minimization generalize well. The construction presented by Shalev-Shwartz et al. (2010), however, featured a learning problem with dimension exponential in the number of training

artificial intelligence, deep learning, machine learning, (20 more...)

2401.12058

Country: Europe (0.14)

Genre: Research Report (1.00)

Industry: Education > Focused Education > Special Education (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Artificial IntelligenceJan-30-2023

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

Sherman, Uri, Koren, Tomer, Mansour, Yishay

Reinforcement Learning (RL; Sutton and Barto, 2018; Mannor et al., 2022) studies online decision making problems in which an agent learns through experience within a dynamic environment, with the goal to minimize a loss function associated with the agent-environment interaction. Modern applications of RL such as robotics(Schulman et al., 2015; Lillicrap et al., 2015; Akkaya et al., 2019), game playing (Mnih et al., 2013; Silver et al., 2018) and autonomous driving (Kiran et al., 2021), almost invariably consist of large scale environments where function approximation techniques are necessary to allow the agent to generalize across different states. Furthermore, some form of agent robustness is usually required to cope with environment irregularities that cannot be faithfully represented by stochasticity assumptions (see e.g., Dulac-Arnold et al., 2021). Theoretical foundations for RL with function approximation (e.g., Jiang et al., 2017; Yang and Wang, 2019; Jin et al., 2020b; Agarwal et al., 2020) have been steadily coming into fruition.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2301.13087

Country:

North America > United States (0.45)
Asia > Middle East (0.28)

Genre:

Research Report (0.50)
Instructional Material > Online (0.40)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)

arXiv.org Artificial IntelligenceJan-12-2023

Benign Underfitting of Stochastic Gradient Descent

Koren, Tomer, Livni, Roi, Mansour, Yishay, Sherman, Uri

We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex optimization framework, where (one pass, without-replacement) SGD is classically known to minimize the population risk at rate $O(1/\sqrt n)$, and prove that, surprisingly, there exist problem instances where the SGD solution exhibits both empirical risk and generalization gap of $\Omega(1)$. Consequently, it turns out that SGD is not algorithmically stable in any sense, and its generalization ability cannot be explained by uniform convergence or any other currently known generalization bound technique for that matter (other than that of its classical analysis). We then continue to analyze the closely related with-replacement SGD, for which we show that an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate. Finally, we interpret our main results in the context of without-replacement SGD for finite-sum convex optimization problems, and derive upper and lower bounds for the multi-epoch regime that significantly improve upon previously known results.

machine learning, optimization problem, stochastic gradient descent, (2 more...)

2202.13361

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

arXiv.org Artificial IntelligenceAug-8-2022

Regret Minimization and Convergence to Equilibria in General-sum Markov Games

Erez, Liad, Lancewicki, Tal, Sherman, Uri, Koren, Tomer, Mansour, Yishay

An abundance of recent impossibility results establish that regret minimization in Markov games with adversarial opponents is both statistically and computationally intractable. Nevertheless, none of these results preclude the possibility of regret minimization under the assumption that all parties adopt the same learning procedure. In this work, we present the first (to our knowledge) algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents. The bounds we obtain are for swap regret, and thus, along the way, imply convergence to a correlated equilibrium. Our algorithm is decentralized, computationally efficient, and does not require any communication between agents. Our key observation is that online learning via policy optimization in Markov games essentially reduces to a form of weighted regret minimization, with unknown weights determined by the path length of the agents' policy sequence. Consequently, controlling the path length leads to weighted regret objectives for which sufficiently adaptive algorithms provide sublinear regret guarantees.

algorithm, artificial intelligence, machine learning, (16 more...)

2207.14211

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJun-29-2021

Optimal Rates for Random Order Online Optimization

Sherman, Uri, Koren, Tomer, Mansour, Yishay

We study online convex optimization in the random order model, recently proposed by \citet{garber2020online}, where the loss functions may be chosen by an adversary, but are then presented to the online algorithm in a uniformly random order. Focusing on the scenario where the cumulative loss function is (strongly) convex, yet individual loss functions are smooth but might be non-convex, we give algorithms that achieve the optimal bounds and significantly outperform the results of \citet{garber2020online}, completely removing the dimension dependence and improving their scaling with respect to the strong convexity parameter. Our analysis relies on novel connections between algorithmic stability and generalization for sampling without-replacement analogous to those studied in the with-replacement i.i.d.~setting, as well as on a refined average stability analysis of stochastic gradient descent.

algorithm, artificial intelligence, machine learning, (18 more...)

2106.15207

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.82)

Industry: Retail > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

arXiv.org Machine LearningFeb-7-2021

Lazy OCO: Online Convex Optimization on a Switching Budget

Sherman, Uri, Koren, Tomer

We study a variant of online convex optimization where the player is permitted to switch decisions at most $S$ times in expectation throughout $T$ rounds. Similar problems have been addressed in prior work for the discrete decision set setting, and more recently in the continuous setting but only with an adaptive adversary. In this work, we aim to fill the gap and present computationally efficient algorithms in the more prevalent oblivious setting, establishing a regret bound of $O(T/S)$ for general convex losses and $\widetilde O(T/S^2)$ for strongly convex losses. In addition, for stochastic i.i.d.~losses, we present a simple algorithm that performs $\log T$ switches with only a multiplicative $\log T$ factor overhead in its regret in both the general and strongly convex settings. Finally, we complement our algorithms with lower bounds that match our upper bounds in some of the cases we consider.

algorithm, artificial intelligence, machine learning, (15 more...)

2102.03803

Country: Asia > Middle East (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)