Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

Chu, Kun, Zhao, Xufeng, Weber, Cornelius, Li, Mengdi, Wermter, Stefan

arXiv.org Artificial Intelligence 

Reinforcement Learning (RL) has shown its power in solving sequential decision-making problems in the robotic domain [1, 2], through optimizing control policies directly from trial-and-error interactions with environments. However, there are still several challenges [3], like sample inefficiency and difficulties in specifying rewards, limiting its applications to the field. Inspired by how we human beings learn skills from more knowledgeable persons such as teachers or supervisors, a potential solution for the above limitations is learning from human expert guidance, so as to inject additional information into the learning process. Human guidance has shown some benefits in terms of providing additional rewards or guidance to accelerate the learning of new tasks, including learning from human demonstrations [4, 5] and feedback [6, 7, 8, 9]. However, collecting sufficient human guidance is time-consuming and costly. Recently, Large Language Models (LLMs) have shown remarkable abilities to generate human-like responses in the textual domain [10, 11], and their applications have been explored in the robotic domain. While some approaches prompt LLMs to instruct robots in performing tasks [12, 13, 14, 15], they focus on utilizing LLMs' common-sense knowledge to give high-level advice for employing pre-trained or hard-coded low-level control policies, which requires much data collection or expert knowledge respectively. Since these works do not perform policy learning when executing tasks with LLMs, the robots' performance highly depends on the LLM's capabilities and consistent presence during the interactions each time tasks are executed.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found