AITopics | sarsa

Collaborating Authors

sarsa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

c4fac8fb3c9e17a2f4553a001f631975-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-10-2026, 05:45:41 GMT

dra, normative solution, representation, (15 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.30)

Technology: Information Technology > Artificial Intelligence (0.71)

Add feedback

Continual Auxiliary Task Learning

Neural Information Processing SystemsFeb-9-2026, 03:59:30 GMT

Tianren constrained Advances Processing, 2020.

artificial intelligence, latexit sha1, machine learning, (11 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

48db71587df6c7c442e5b76cc723169a-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 07:44:37 GMT

model-based behavior, model-based method, muzero, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration

Kim, Hwanwoo, Laber, Eric

arXiv.org Machine LearningJan-28-2026

Q-learning and SARSA are foundational reinforcement learning algorithms whose practical success depends critically on step-size calibration. Step-sizes that are too large can cause numerical instability, while step-sizes that are too small can lead to slow progress. We propose implicit variants of Q-learning and SARSA that reformulate their iterative updates as fixed-point equations. This yields an adaptive step-size adjustment that scales inversely with feature norms, providing automatic regularization without manual tuning. Our non-asymptotic analyses demonstrate that implicit methods maintain stability over significantly broader step-size ranges. Under favorable conditions, it permits arbitrarily large step-sizes while achieving comparable convergence rates. Empirical validation across benchmark environments spanning discrete and continuous state spaces shows that implicit Q-learning and SARSA exhibit substantially reduced sensitivity to step-size selection, achieving stable performance with step-sizes that would cause standard methods to fail.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2601.18907

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Convergence Guarantees for Federated SARSA with Local Training and Heterogeneous Agents

Mangold, Paul, Berthier, Eloïse, Moulines, Eric

arXiv.org Machine LearningDec-22-2025

We present a novel theoretical analysis of Federated SARSA (FedSARSA) with linear function approximation and local training. We establish convergence guarantees for FedSARSA in the presence of heterogeneity, both in local transitions and rewards, providing the first sample and communication complexity bounds in this setting. At the core of our analysis is a new, exact multi-step error expansion for single-agent SARSA, which is of independent interest. Our analysis precisely quantifies the impact of heterogeneity, demonstrating the convergence of FedSARSA with multiple local updates. Crucially, we show that FedSARSA achieves linear speed-up with respect to the number of agents, up to higher-order terms due to Markovian sampling. Numerical experiments support our theoretical findings.

convergence guarantee, heterogeneity, sarsa, (14 more...)

arXiv.org Machine Learning

2512.17688

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.27)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > UAE (0.04)

Genre: Research Report (0.64)

Industry: Education (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Convergent Reinforcement Learning Algorithms for Stochastic Shortest Path Problem

Guin, Soumyajit, Bhatnagar, Shalabh

arXiv.org Artificial IntelligenceDec-3-2025

In this paper we propose two algorithms in the tabular setting and an algorithm for the function approximation setting for the Stochastic Shortest Path (SSP) problem. SSP problems form an important class of problems in Reinforcement Learning (RL), as other types of cost-criteria in RL can be formulated in the setting of SSP. We show asymptotic almost-sure convergence for all our algorithms. We observe superior performance of our tabular algorithms compared to other well-known convergent RL algorithms. We further observe reliable performance of our function approximation algorithm compared to other algorithms in the function approximation setting.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2508.13963

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

9f9e8cba3700df6a947a8cf91035ab84-AuthorFeedback.pdf

Neural Information Processing SystemsOct-3-2025, 08:27:35 GMT

assumption, projection, sarsa, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning -- Supplementary Material -- AT abular Experiments

Neural Information Processing SystemsOct-2-2025, 20:18:27 GMT

Here, we discuss some additional settings for the tabular experiments. The reason for this is that Sarsa(0.95), in contrast to MB-VI and MB-SU, is a multi-step Therefore, there is stochasticity in the update target even in deterministic environments due to exploration of the behavior policy. All methods used optimistic initialization. The pseudocode of the tabular, on-policy method used in Section 5.1 is shown in Algorithm 1. These estimates are updated at the end of the episode, using the data gathered during the episode.

experiment, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning Harm van Seijen

Neural Information Processing SystemsOct-2-2025, 20:18:20 GMT

This is a great development, but the lack of a consistent metric to evaluate such methods makes it difficult to compare various approaches.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Analysis of Action-Value Temporal-Difference Methods That Learn State Values

Daley, Brett, Nagarajan, Prabhat, White, Martha, Machado, Marlos C.

arXiv.org Artificial IntelligenceSep-5-2025

The hallmark feature of temporal-difference (TD) learning is bootstrapping: using value predictions to generate new value predictions. The vast majority of TD methods for control learn a policy by bootstrapping from a single action-value function (e.g., Q-learning and Sarsa). Significantly less attention has been given to methods that bootstrap from two asymmetric value functions: i.e., methods that learn state values as an intermediate step in learning action values. Existing algorithms in this vein can be categorized as either QV -learning or A V -learning. Though these algorithms have been investigated to some degree in prior work, it remains unclear if and when it is advantageous to learn two value functions instead of just one--and whether such approaches are theoretically sound in general. In this paper, we analyze these algorithmic families in terms of convergence and sample efficiency. We find that while both families are more efficient than Expected Sarsa in the prediction setting, only A V -learning methods offer any major benefit over Q-learning in the control setting. Finally, we introduce a new A V -learning algorithm called Regularized Dueling Q-learning (RDQ), which significantly outperforms Dueling DQN in the MinAtar benchmark.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2507.09523

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback