AITopics | lower-level policy

Collaborating Authors

lower-level policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data-Efficient Hierarchical Reinforcement Learning

Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, Sergey Levine

Neural Information Processing SystemsFeb-15-2026, 01:50:45 GMT

Neural Information Processing Systems http://nips.cc/

arxiv preprint arxiv, higher-level policy, lower-level policy, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Data-Efficient Hierarchical Reinforcement Learning

Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, Sergey Levine

Neural Information Processing SystemsNov-20-2025, 20:53:34 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Data-Efficient Hierarchical Reinforcement Learning

Neural Information Processing SystemsOct-8-2024, 08:11:19 GMT

Summary The authors present a heirarchical reinforcement learning approach which learns at two levels, a higher level agent that is learning to perform actions in the form of medium term goals (changes in the state variable) and a low level agent that is aiming to (and rewarded for) achieving these medium term goals by performing atomic level actions. The key contributions identified by the authors are that learning at both lower and higher level are off-policy and take advantage of recent developments in off-policy learning. The authors say that the more challenging aspect of this, is the off policy learning at the higher level, as the actions (sub-goals) chosen during early experience are not effectively met by the low level policy. Their solution is to instead replace (or augment) high level experience with synthetic high level actions (sub-goals) which would be more likely to have happened based on the current instantiation of the low level controller. An additional key feature is that the sub-goals, rather than given in terms of absolute (observed) states, are instead given in terms of relative states (deltas), and there is a mechanism to update this sub-goal appropirately as the low level controller advances.

data-efficient hierarchical reinforcement learning, level agent, lower-level policy, (11 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Add feedback

DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning

Singh, Utsav, Chakraborty, Souradip, Suttle, Wesley A., Sadler, Brian M., Namboodiri, Vinay P, Bedi, Amrit Singh

arXiv.org Artificial IntelligenceJun-16-2024

Learning control policies to perform complex robotics tasks from human preference data presents significant challenges. On the one hand, the complexity of such tasks typically requires learning policies to perform a variety of subtasks, then combining them to achieve the overall goal. At the same time, comprehensive, well-engineered reward functions are typically unavailable in such problems, while limited human preference data often is; making efficient use of such data to guide learning is therefore essential. Methods for learning to perform complex robotics tasks from human preference data must overcome both these challenges simultaneously. In this work, we introduce DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning, an efficient hierarchical approach that leverages direct preference optimization to learn a higher-level policy and reinforcement learning to learn a lower-level policy. DIPPER enjoys improved computational efficiency due to its use of direct preference optimization instead of standard preference-based approaches such as reinforcement learning from human feedback, while it also mitigates the well-known hierarchical reinforcement learning issues of non-stationarity and infeasible subgoal generation due to our use of primitive-informed regularization inspired by a novel bi-level optimization formulation of the hierarchical reinforcement learning problem. To validate our approach, we perform extensive experimental analysis on a variety of challenging robotics tasks, demonstrating that DIPPER outperforms hierarchical and non-hierarchical baselines, while ameliorating the non-stationarity and infeasible subgoal generation issues of hierarchical reinforcement learning.

formulation, higher-level policy, objective, (15 more...)

arXiv.org Artificial Intelligence

2406.10892

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > Maryland > Prince George's County > Adelphi (0.04)
(5 more...)

Genre: Research Report (0.51)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

Singh, Utsav, Suttle, Wesley A., Sadler, Brian M., Namboodiri, Vinay P., Bedi, Amrit Singh

arXiv.org Artificial IntelligenceJun-16-2024

In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mitigate non-stationarity, which is common in existing hierarchical approaches, and demonstrates impressive performance across a range of challenging sparse-reward tasks. Since obtaining human feedback is typically impractical, we propose to replace the human-in-the-loop approach with our primitive-in-the-loop approach, which generates feedback using sparse rewards provided by the environment. Moreover, in order to prevent infeasible subgoal prediction and avoid degenerate solutions, we propose primitive-informed regularization that conditions higher-level policies to generate feasible subgoals for lower-level policies. We perform extensive experiments to show that PIPER mitigates non-stationarity in hierarchical reinforcement learning and achieves greater than 50$\%$ success rates in challenging, sparse-reward robotic environments, where most other baselines fail to achieve any significant progress.

baseline, learning, piper, (15 more...)

arXiv.org Artificial Intelligence

2404.13423

Country:

North America > United States > Florida > Orange County > Orlando (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Zero-shot Transfer Learning of Driving Policy via Socially Adversarial Traffic Flow

Zhang, Dongkun, Xue, Jintao, Cui, Yuxiang, Wang, Yunkai, Liu, Eryun, Jing, Wei, Chen, Junbo, Xiong, Rong, Wang, Yue

arXiv.org Artificial IntelligenceApr-25-2023

Acquiring driving policies that can transfer to unseen environments is challenging when driving in dense traffic flows. The design of traffic flow is essential and previous studies are unable to balance interaction and safety-criticism. To tackle this problem, we propose a socially adversarial traffic flow. We propose a Contextual Partially-Observable Stochastic Game to model traffic flow and assign Social Value Orientation (SVO) as context. We then adopt a two-stage framework. In Stage 1, each agent in our socially-aware traffic flow is driven by a hierarchical policy where upper-level policy communicates genuine SVOs of all agents, which the lower-level policy takes as input. In Stage 2, each agent in the socially adversarial traffic flow is driven by the hierarchical policy where upper-level communicates mistaken SVOs, taken by the lower-level policy trained in Stage 1. Driving policy is adversarially trained through a zero-sum game formulation with upper-level policies, resulting in a policy with enhanced zero-shot transfer capability to unseen traffic flows. Comprehensive experiments on cross-validation verify the superior zero-shot transfer performance of our method.

artificial intelligence, machine learning, traffic flow, (17 more...)

arXiv.org Artificial Intelligence

2304.12821

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.46)

Industry: Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.93)

Add feedback

H-TSP: Hierarchically Solving the Large-Scale Travelling Salesman Problem

Pan, Xuanhao, Jin, Yan, Ding, Yuandong, Feng, Mingxiao, Zhao, Li, Song, Lei, Bian, Jiang

arXiv.org Artificial IntelligenceApr-18-2023

We propose an end-to-end learning framework based on hierarchical reinforcement learning, called H-TSP, for addressing the large-scale Travelling Salesman Problem (TSP). The proposed H-TSP constructs a solution of a TSP instance starting from the scratch relying on two components: the upper-level policy chooses a small subset of nodes (up to 200 in our experiment) from all nodes that are to be traversed, while the lower-level policy takes the chosen nodes as input and outputs a tour connecting them to the existing partial route (initially only containing the depot). After jointly training the upper-level and lower-level policies, our approach can directly generate solutions for the given TSP instances without relying on any time-consuming search procedures. To demonstrate effectiveness of the proposed approach, we have conducted extensive experiments on randomly generated TSP instances with different numbers of nodes. We show that H-TSP can achieve comparable results (gap 3.42% vs. 7.32%) as SOTA search-based approaches, and more importantly, we reduce the time consumption up to two orders of magnitude (3.32s vs. 395.85s). To the best of our knowledge, H-TSP is the first end-to-end deep reinforcement learning approach that can scale to TSP instances of up to 10000 nodes. Although there are still gaps to SOTA results with respect to solution quality, we believe that H-TSP will be useful for practical applications, particularly those that are time-sensitive e.g., on-call routing and ride hailing service.

machine learning, node, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2304.09395

Country:

Asia > China (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Passenger (0.68)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Training Transition Policies via Distribution Matching for Complex Tasks

Byun, Ju-Seung, Perrault, Andrew

arXiv.org Artificial IntelligenceOct-8-2021

Humans decompose novel complex tasks into simpler ones to exploit previously learned skills. Analogously, hierarchical reinforcement learning seeks to leverage lower-level policies for simple tasks to solve complex ones. However, because each lower-level policy induces a different distribution of states, transitioning from one lower-level policy to another may fail due to an unexpected starting state. We introduce transition policies that smoothly connect lower-level policies by producing a distribution of states and actions that matches what is expected by the next policy. Training transition policies is challenging because the natural reward signal -- whether the next policy can execute its subtask successfully -- is sparse. By training transition policies via adversarial inverse reinforcement learning to match the distribution of expected states and actions, we avoid relying on task-based reward. To further improve performance, we use deep Q-learning with a binary action space to determine when to switch from a transition policy to the next pre-trained policy, using the success or failure of the next subtask as the reward. Although the reward is still sparse, the problem is less severe due to the simple binary action space. We demonstrate our method on continuous bipedal locomotion and arm manipulation tasks that require diverse skills. We show that it smoothly connects the lower-level policies, achieving higher success rates than previous methods that search for successful trajectories based on a reward function, but do not match the state distribution.

lower-level policy, pre-trained policy, transition policy, (12 more...)

arXiv.org Artificial Intelligence

2110.04357

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

Nachum, Ofir, Tang, Haoran, Lu, Xingyu, Gu, Shixiang, Lee, Honglak, Levine, Sergey

arXiv.org Artificial IntelligenceSep-23-2019

Hierarchical reinforcement learning has demonstrated significant success at solving difficult reinforcement learning (RL) tasks. Previous works have motivated the use of hierarchy by appealing to a number of intuitive benefits, including learning over temporally extended transitions, exploring over temporally extended periods, and training and exploring in a more semantically meaningful action space, among others. However, in fully observed, Markovian settings, it is not immediately clear why hierarchical RL should provide benefits over standard "shallow" RL architectures. In this work, we isolate and evaluate the claimed benefits of hierarchical RL on a suite of tasks encompassing locomotion, navigation, and manipulation. Surprisingly, we find that most of the observed benefits of hierarchy can be attributed to improved exploration, as opposed to easier policy learning or imposed hierarchical structures. Given this insight, we present exploration techniques inspired by hierarchy that achieve performance competitive with hierarchical RL while at the same time being much simpler to use and implement.

agent, exploration, hierarchy, (14 more...)

arXiv.org Artificial Intelligence

1909.10618

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Data-Efficient Hierarchical Reinforcement Learning -- HIRO

#artificialintelligenceSep-5-2019, 11:48:10 GMT

Traditional reinforcement learning algorithms have achieved encouraging success in recent years. Their nature of reasoning on the atomic scale, however, makes them hard to scale to complex tasks. Hierarchical Reinforcement Learning(HRL) introduces high-level abstraction, whereby the agent is able to plan on different scales. In this post, we discuss an HRL algorithm proposed by Ofir Nachum et al. in Google Brain at NIPS 2018. The algorithm, known as HIerarchical Reinforcement learning with Off-policy correction(HIRO), is designed for goal-directed tasks, in which the agent tries to reach some goal state.

hierarchical reinforcement learning, machine learning, reinforcement learning, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback