AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

RobotDancing: Residual-Action Reinforcement Learning Enables Robust Long-Horizon Humanoid Motion Tracking

Sun, Zhenguo, Peng, Yibo, Meng, Yuan, Li, Xukun, Huang, Bo-Sheng, Bing, Zhenshan, Wang, Xinlong, Knoll, Alois

arXiv.org Artificial IntelligenceSep-26-2025

Abstract-- Long-horizon, high-dynamic motion tracking on humanoids remains brittle because absolute joint commands cannot compensate model-plant mismatch, leading to error accumulation. We propose RobotDancing, a simple, scalable framework that predicts residual joint targets to explicitly correct dynamics discrepancies. The pipeline is end-to-end--training, sim-to-sim validation, and zero-shot sim-to-real--and uses a single-stage reinforcement learning (RL) setup with a unified observation, reward, and hyperparameter configuration. RobotDancing can track multi-minute, high-energy behaviors (jumps, spins, cartwheels) and deploys zero-shot to hardware with high motion tracking quality. I. INTRODUCTION Humanoid robots are increasingly expected to execute long-horizon, highly dynamic behaviors such as dance, where small tracking errors compound rapidly and destabilize control. A principal source of such drift is the mismatch between idealized reference trajectories and the robot's true physics (actuation limits, friction, inertia, latency).

arxiv preprint arxiv, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2509.20717

Country: Asia > China (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

30754e5f4cd69d64b5527cdd87d3cf62-Paper-Conference.pdf

Neural Information Processing SystemsSep-25-2025, 23:08:22 GMT

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country: North America > Canada (0.46)

Genre: Research Report > Experimental Study (1.00)

Industry: Energy > Oil & Gas > Upstream (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

35fdecdf8861bc15110d48fbec3193cf-Paper-Conference.pdf

Neural Information Processing SystemsSep-25-2025, 16:56:33 GMT

data mining, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.71)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

92fde850d824c2ba9b563cb6fa4078c3-Paper.pdf

Neural Information Processing SystemsSep-25-2025, 16:12:25 GMT

artificial intelligence, bandit, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > Alberta (0.28)

Industry: Energy > Oil & Gas > Upstream (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Data Science > Data Mining > Big Data (0.50)
(2 more...)

Add feedback

a36b598abb934e4528412e5a2127b931-Paper.pdf

Neural Information Processing SystemsSep-25-2025, 13:21:17 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe (0.93)
North America > United States (0.93)

Genre:

Research Report (0.46)
Instructional Material (0.46)

Industry: Energy > Oil & Gas > Upstream (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

2a91de02871011d0090e662ffd6f2328-Paper-Conference.pdf

Neural Information Processing SystemsSep-25-2025, 09:26:29 GMT

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.27)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

2a4310c4fd24bd336aa2f64f93cb5d39-Paper-Conference.pdf

Neural Information Processing SystemsSep-25-2025, 09:17:37 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Industry: Energy > Oil & Gas > Upstream (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Add feedback

56577889b3c1cd083b6d7b32d32f99d5-Paper.pdf

Neural Information Processing SystemsSep-25-2025, 03:21:33 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Industry:

Leisure & Entertainment (0.69)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Pure Exploration via Frank-Wolfe Self-Play

Liu, Xinyu, Qin, Chao, You, Wei

arXiv.org Machine LearningSep-25-2025

We study pure exploration in structured stochastic multi-armed bandits, aiming to efficiently identify the correct hypothesis from a finite set of alternatives. For a broad class of tasks, asymptotic analyses reduce to a maximin optimization that admits a two-player zero-sum game interpretation between an experimenter and a skeptic: the experimenter allocates measurements to rule out alternatives while the skeptic proposes alternatives. We reformulate the game by allowing the skeptic to adopt a mixed strategy, yielding a concave-convex saddle-point problem. This viewpoint leads to Frank-Wolfe Self-Play (FWSP): a projection-free, regularization-free, tuning-free method whose one-hot updates on both sides match the bandit sampling paradigm. However, structural constraints introduce sharp pathologies that complicate algorithm design and analysis: our linear-bandit case study exhibits nonunique optima, optimal designs with zero mass on the best arm, bilinear objectives, and nonsmoothness at the boundary. We address these challenges via a differential-inclusion argument, proving convergence of the game value for best-arm identification in linear bandits. Our analysis proceeds through a continuous-time limit: a differential inclusion with a Lyapunov function that decays exponentially, implying a vanishing duality gap and convergence to the optimal value. Although Lyapunov analysis requires differentiability of the objective, which is not guaranteed on the boundary, we show that along continuous trajectories the algorithm steers away from pathological nonsmooth points and achieves uniform global convergence to the optimal game value. We then embed the discrete-time updates into a perturbed flow and show that the discrete game value also converges. Building on FWSP, we further propose a learning algorithm based on posterior sampling. Numerical experiments demonstrate a vanishing duality gap.

algorithm, convergence, linear bandit, (14 more...)

arXiv.org Machine Learning

2509.19901

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.48)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

Learning from Observation: A Survey of Recent Advances

Burnwal, Returaj, Mehta, Hriday, Bhatt, Nirav Pravinbhai, Ravindran, Balaraman

arXiv.org Machine LearningSep-25-2025

Imitation Learning (IL) algorithms offer an efficient way to train an agent by mimicking an expert's behavior without requiring a reward function. IL algorithms often necessitate access to state and action information from expert demonstrations. Although expert actions can provide detailed guidance, requiring such action information may prove impractical for real-world applications where expert actions are difficult to obtain. To address this limitation, the concept of learning from observation (LfO) or state-only imitation learning (SOIL) has recently gained attention, wherein the imitator only has access to expert state visitation information. In this paper, we present a framework for LfO and use it to survey and classify existing LfO methods in terms of their trajectory construction, assumptions and algorithm's design choices. This survey also draws connections between several related fields like offline RL, model-based RL and hierarchical RL. Finally, we use our framework to identify open problems and suggest future research directions.

demonstration, imitator policy, learning, (15 more...)

arXiv.org Machine Learning

2509.19379

Country: