AITopics | Meyn, Sean

Collaborating Authors

Meyn, Sean

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Functional role of synchronization: A mean-field control perspective

Mehta, Prashant, Meyn, Sean

arXiv.org Machine LearningFeb-1-2025

Our friend and mentor Peter Caines has, together with his colleagues, created new foundations for studying collective dynamics in complex systems. Of particular inspiration to us has been his pioneering work in mean-field games (MFGs) launched two decades ago [10, 24, 25], and the related field of mean-field control. Peter pointed the way to both formulate and solve the problem of collective dynamics arising in a large population of heterogeneous dynamical systems. In this paper we survey some elements of MFGs within the context of controlled coupled oscillators. We begin by introducing a model for a single oscillator: dθ(t) = (ω + u(t)) dt + σ dξ(t), mod 2π (1) where θ(t) [0, 2π) is the phase of the oscillator at time t, ω is the nominal frequency with units of radiansper-second, {ξ(t): t 0} is a standard Wiener process, and u(t) is a control signal whose interpretation depends on the context. Unless otherwise noted, the SDEs are interpreted in their Itô form.

artificial intelligence, machine learning, oscillator, (19 more...)

arXiv.org Machine Learning

2502.0059

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report (0.64)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Game Theory (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Revisiting Step-Size Assumptions in Stochastic Approximation

Lauand, Caio Kalil, Meyn, Sean

arXiv.org Machine LearningJun-3-2024

Many machine learning and optimization algorithms are built upon the framework of stochastic approximation (SA), for which the selection of step-size (or learning rate) is essential for success. For the sake of clarity, this paper focuses on the special case $\alpha_n = \alpha_0 n^{-\rho}$ at iteration $n$, with $\rho \in [0,1]$ and $\alpha_0>0$ design parameters. It is most common in practice to take $\rho=0$ (constant step-size), while in more theoretically oriented papers a vanishing step-size is preferred. In particular, with $\rho \in (1/2, 1)$ it is known that on applying the averaging technique of Polyak and Ruppert, the mean-squared error (MSE) converges at the optimal rate of $O(1/n)$ and the covariance in the central limit theorem (CLT) is minimal in a precise sense. The paper revisits step-size selection in a general Markovian setting. Under readily verifiable assumptions, the following conclusions are obtained provided $0<\rho<1$: $\bullet$ Parameter estimates converge with probability one, and also in $L_p$ for any $p\ge 1$. $\bullet$ The MSE may converge very slowly for small $\rho$, of order $O(\alpha_n^2)$ even with averaging. $\bullet$ For linear stochastic approximation the source of slow convergence is identified: for any $\rho\in (0,1)$, averaging results in estimates for which the error $\textit{covariance}$ vanishes at the optimal rate, and moreover the CLT covariance is optimal in the sense of Polyak and Ruppert. However, necessary and sufficient conditions are obtained under which the $\textit{bias}$ converges to zero at rate $O(\alpha_n)$. This is the first paper to obtain such strong conclusions while allowing for $\rho \le 1/2$. A major conclusion is that the choice of $\rho =0$ or even $\rho<1/2$ is justified only in select settings -- In general, bias may preclude fast convergence.

approximation, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2405.17834

Country:

North America > United States > Massachusetts (0.14)
North America > United States > Florida > Alachua County > Gainesville (0.14)

Genre:

Overview (0.46)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

The Curse of Memory in Stochastic Approximation: Extended Version

Lauand, Caio Kalil, Meyn, Sean

arXiv.org Machine LearningSep-17-2023

Theory and application of stochastic approximation (SA) has grown within the control systems community since the earliest days of adaptive control. This paper takes a new look at the topic, motivated by recent results establishing remarkable performance of SA with (sufficiently small) constant step-size $\alpha>0$. If averaging is implemented to obtain the final parameter estimate, then the estimates are asymptotically unbiased with nearly optimal asymptotic covariance. These results have been obtained for random linear SA recursions with i.i.d. coefficients. This paper obtains very different conclusions in the more common case of geometrically ergodic Markovian disturbance: (i) The $\textit{target bias}$ is identified, even in the case of non-linear SA, and is in general non-zero. The remaining results are established for linear SA recursions: (ii) the bivariate parameter-disturbance process is geometrically ergodic in a topological sense; (iii) the representation for bias has a simpler form in this case, and cannot be expected to be zero if there is multiplicative noise; (iv) the asymptotic covariance of the averaged parameters is within $O(\alpha)$ of optimal. The error term is identified, and may be massive if mean dynamics are not well conditioned. The theory is illustrated with application to TD-learning.

asymptotic covariance, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2309.02944

Country: North America > United States > Kansas > Douglas County > Lawrence (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Add feedback

Convex Q Learning in a Stochastic Environment: Extended Version

Lu, Fan, Meyn, Sean

arXiv.org Artificial IntelligenceSep-10-2023

The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of optimal control. The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, and a significant relationship between the solution to the new convex program, and the solution to standard Q-learning. The second set of contributions concern algorithm design and analysis: (i) A direct model-free method for approximating the convex program for Q-learning shares properties with its ideal. In particular, a bounded solution is ensured subject to a simple property of the basis functions; (ii) The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense; (iii) The approach can be generalized to a range of performance criteria, and it is found that variance can be reduced by considering ``relative'' dynamic programming equations; (iv) The theory is illustrated with an application to a classical inventory control problem.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2309.05105

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Stability of Q-Learning Through Design and Optimism

Meyn, Sean

arXiv.org Artificial IntelligenceAug-21-2023

Q-learning has become an important part of the reinforcement learning toolkit since its introduction in the dissertation of Chris Watkins in the 1980s. The purpose of this paper is in part a tutorial on stochastic approximation and Q-learning, providing details regarding the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in Nancy France, June 2023. The paper also presents new approaches to ensure stability and potentially accelerated convergence for these algorithms, and stochastic approximation in other settings. Two contributions are entirely new: 1. Stability of Q-learning with linear function approximation has been an open topic for research for over three decades. It is shown that with appropriate optimistic training in the form of a modified Gibbs policy, there exists a solution to the projected Bellman equation, and the algorithm is stable (in terms of bounded parameter estimates). Convergence remains one of many open topics for research. 2. The new Zap Zero algorithm is designed to approximate the Newton-Raphson flow without matrix inversion. It is stable and convergent under mild assumptions on the mean flow vector field for the algorithm, and compatible statistical assumption on an underlying Markov chain. The algorithm is a general approach to stochastic approximation which in particular applies to Q-learning with "oblivious" training even with non-linear function approximation.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2307.02632

Country:

North America > United States > New York (0.28)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.24)
North America > United States > Massachusetts > Middlesex County (0.14)
(3 more...)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.86)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

Borkar, Vivek, Chen, Shuhang, Devraj, Adithya, Kontoyiannis, Ioannis, Meyn, Sean

arXiv.org Artificial IntelligenceJun-18-2023

The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ in which $\Phi$ is a geometrically ergodic Markov chain on a general state space $\textsf{X}$ with stationary distribution $\pi$, and $f:\Re^d\times\textsf{X}\to\Re^d$. The main results are established under a version of the Donsker-Varadhan Lyapunov drift condition known as (DV3), and a stability condition for the mean flow with vector field $\bar{f}(\theta)=\textsf{E}[f(\theta,\Phi)]$, with $\Phi\sim\pi$. (i) $\{ \theta_n\}$ is convergent a.s. and in $L_4$ to the unique root $\theta^*$ of $\bar{f}(\theta)$. (ii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error. (iii) The CLT holds for the normalized version, $z_n{=:} \sqrt{n} (\theta^{\text{PR}}_n -\theta^*)$, of the averaged parameters, $\theta^{\text{PR}}_n {=:} n^{-1} \sum_{k=1}^n\theta_k$, subject to standard assumptions on the step-size. Moreover, the normalized covariance converges, $$ \lim_{n \to \infty} n \textsf{E} [ {\widetilde{\theta}}^{\text{ PR}}_n ({\widetilde{\theta}}^{\text{ PR}}_n)^T ] = \Sigma_\theta^*,\;\;\;\textit{with $\widetilde{\theta}^{\text{ PR}}_n = \theta^{\text{ PR}}_n -\theta^*$,} $$ where $\Sigma_\theta^*$ is the minimal covariance of Polyak and Ruppert. (iv) An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and the Markov chain $\Phi$ is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded: $ \textsf{E} [ \| \theta_n \|^2 ] \to \infty$ as $n\to\infty$.

approximation, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2110.14427

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States > Kansas (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)

Add feedback

Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation

Chen, Shuhang, Devraj, Adithya M., Bušić, Ana, Meyn, Sean

arXiv.org Machine LearningFeb-6-2020

This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpreted as special cases of stochastic approximation (SA). It is argued that it is not possible in general to obtain a Hoeffding bound on the error sequence, even when the underlying Markov chain is reversible and geometrically ergodic, such as the M/M/1 queue. This is motivation for the focus on mean square error bounds for parameter estimates. It is shown that mean square error achieves the optimal rate of $O(1/n)$, subject to conditions on the step-size sequence. Moreover, the exact constants in the rate are obtained, which is of great value in algorithm design.

artificial intelligence, reinforcement learning, sequence, (17 more...)

arXiv.org Machine Learning

2002.02584

Country:

North America > United States (0.28)
Europe (0.28)
Asia > India (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

Zap Q-Learning With Nonlinear Function Approximation

Chen, Shuhang, Devraj, Adithya M., Bušić, Ana, Meyn, Sean

arXiv.org Machine LearningOct-11-2019

The Zap stochastic approximation (SA) algorithm was introduced recently as a means to accelerate convergence in reinforcement learning algorithms. While numerical results were impressive, stability (in the sense of boundedness of parameter estimates) was established in only a few special cases. This class of algorithms is generalized in this paper, and stability is established under very general conditions. This general result can be applied to a wide range of algorithms found in reinforcement learning. Two classes are considered in this paper: (i)The natural generalization of Watkins' algorithm is not always stable in function approximation settings. Parameter estimates may diverge to infinity even in the \textit{linear} function approximation setting with a simple finite state-action MDP. Under mild conditions, the Zap SA algorithm provides a stable algorithm, even in the case of \textit{nonlinear} function approximation. (ii) The GQ algorithm of Maei et.~al.~2010 is designed to address the stability challenge. Analysis is provided to explain why the algorithm may be very slow to converge in practice. The new Zap GQ algorithm is stable even for nonlinear function approximation.

artificial intelligence, fuzzy logic, null, (20 more...)

arXiv.org Machine Learning

1910.05405

Country:

North America > United States > New York (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Zap Q-Learning

Devraj, Adithya M, Meyn, Sean

Neural Information Processing SystemsDec-31-2017

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scale update equation for the matrix gain sequence. The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.14)
North America > United States > Florida > Alachua County > Gainesville (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback