AITopics | hybrid reinforcement learning

Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning

Neural Information Processing SystemsDec-26-2025, 13:38:09 GMT

This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes access to both an offline dataset and online interactions with the unknown environment. A central question boils down to how to efficiently utilize online data to strengthen and complement the offline dataset and enable effective policy fine-tuning. Leveraging recent advances in reward-agnostic exploration and offline RL, we design a three-stage hybrid RL algorithm that beats the best of both worlds --- pure offline RL and pure online RL --- in terms of sample complexities. The proposed algorithm does not require any reward information during data collection.

hybrid reinforcement learning, provable statistical benefit, reward-agnostic fine-tuning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning

Neural Information Processing SystemsMay-27-2025, 08:12:03 GMT

This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes access to both an offline dataset and online interactions with the unknown environment. A central question boils down to how to efficiently utilize online data to strengthen and complement the offline dataset and enable effective policy fine-tuning. Leveraging recent advances in reward-agnostic exploration and offline RL, we design a three-stage hybrid RL algorithm that beats the best of both worlds --- pure offline RL and pure online RL --- in terms of sample complexities. The proposed algorithm does not require any reward information during data collection. Our theory is developed based on a new notion called single-policy partial concentrability, which captures the trade-off between distribution mismatch and miscoverage and guides the interplay between offline and online data.

artificial intelligence, hybrid reinforcement learning, machine learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning

Neural Information Processing SystemsJan-19-2025, 19:05:51 GMT

This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes access to both an offline dataset and online interactions with the unknown environment. A central question boils down to how to efficiently utilize online data to strengthen and complement the offline dataset and enable effective policy fine-tuning. Leveraging recent advances in reward-agnostic exploration and offline RL, we design a three-stage hybrid RL algorithm that beats the best of both worlds --- pure offline RL and pure online RL --- in terms of sample complexities. The proposed algorithm does not require any reward information during data collection. Our theory is developed based on a new notion called single-policy partial concentrability, which captures the trade-off between distribution mismatch and miscoverage and guides the interplay between offline and online data.

hybrid reinforcement learning, provable statistical benefit, reward-agnostic fine-tuning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Hybrid Reinforcement Learning and Its Application to Biped Robot Control

Neural Information Processing SystemsApr-6-2023, 17:50:02 GMT

A learning system composed of linear control modules, reinforce(cid:173) ment learning modules and selection modules (a hybrid reinforce(cid:173) ment learning system) is proposed for the fast learning of real-world control problems. The selection modules choose one appropriate control module dependent on the state. This hybrid learning sys(cid:173) tem was applied to the control of a stilt-type biped robot. It learned the control on a sloped floor more quickly than the usual reinforce(cid:173) ment learning because it did not need to learn the control on a flat floor, where the linear control module can control the robot. When it was trained by a 2-step learning (during the first learning step, the selection module was trained by a training procedure con(cid:173) trolled only by the linear controller), it learned the control more quickly.

biped robot control, control module, hybrid reinforcement learning, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Hybrid Reinforcement Learning and Its Application to Biped Robot Control

Yamada, Satoshi, Watanabe, Akira, Nakashima, Michio

Neural Information Processing SystemsApr-22-2020, 20:38:26 GMT

Papers published at the Neural Information Processing Systems Conference.

application, biped robot control, hybrid reinforcement learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Hybrid Reinforcement Learning with Expert State Sequences

Guo, Xiaoxiao, Chang, Shiyu, Yu, Mo, Tesauro, Gerald, Campbell, Murray

arXiv.org Artificial IntelligenceMar-10-2019

Existing imitation learning approaches often require that the complete demonstration data, including sequences of actions and states, are available. In this paper, we consider a more realistic and difficult scenario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are unobserved. We propose a novel tensor-based model to infer the unobserved actions of the expert state sequences. The policy of the agent is then optimized via a hybrid objective combining reinforcement learning and imitation learning. We evaluated our hybrid approach on an illustrative domain and Atari games. The empirical results show that (1) the agents are able to leverage state expert sequences to learn faster than pure reinforcement learning baselines, (2) our tensor-based action inference model is advantageous compared to standard deep neural networks in inferring expert actions, and (3) the hybrid policy optimization objective is robust against noise in expert state sequences.

action inference model, agent, learning, (11 more...)

arXiv.org Artificial Intelligence

1903.0411

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (0.72)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Hybrid Reinforcement Learning and Its Application to Biped Robot Control

Yamada, Satoshi, Watanabe, Akira, Nakashima, Michio

Neural Information Processing SystemsDec-31-1998

Advanced Technology R&D Center Mitsubishi Electric Corporation Amagasaki, Hyogo 661-0001, Japan Abstract A learning system composed of linear control modules, reinforcement learningmodules and selection modules (a hybrid reinforcement learning system) is proposed for the fast learning of real-world control problems. The selection modules choose one appropriate control module dependent on the state. It learned the control on a sloped floor more quickly than the usual reinforcement learningbecause it did not need to learn the control on a flat floor, where the linear control module can control the robot. When it was trained by a 2-step learning (during the first learning step, the selection module was trained by a training procedure controlled onlyby the linear controller), it learned the control more quickly. The average number of trials (about 50) is so small that the learning system is applicable to real robot control. 1 Introduction Reinforcement learning has the ability to solve general control problems because it learns behavior through trial-and-error interactions with a dynamic environment.

machine learning, reinforcement, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: Asia > Japan (0.24)

Industry: Automobiles & Trucks > Manufacturer (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

hybrid reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning

Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning

Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning

Hybrid Reinforcement Learning and Its Application to Biped Robot Control

Hybrid Reinforcement Learning and Its Application to Biped Robot Control

Hybrid Reinforcement Learning with Expert State Sequences

Hybrid Reinforcement Learning and Its Application to Biped Robot Control