Optimizing Latent Goal by Learning from Trajectory Preference
Zhao, Guangyu, Lian, Kewei, Lin, Haowei, Fu, Haobo, Fu, Qiang, Cai, Shaofei, Wang, Zihao, Liang, Yitao
–arXiv.org Artificial Intelligence
Recently, pre-training foundation policies in open-world environments with web-scale unlabeled datasets have become an increasingly popular trend in the domain of sequential control(Baker et al., 2022; Brohan et al., 2023a; Collaboration et al., 2024; Yang et al., 2023; Zhang et al., 2022). These foundation policies possess broad world knowledge, which can be transferred to downstream tasks. In the realm of foundation policies, there exists a category known as goal-conditioned policies, which are capable of processing input goals (instructions) and executing the corresponding tasks (Chane-Sane et al., 2021; Ding et al., 2019). The goal can be in different modalities, such as text instructions (Lifshitz et al., 2024), video demonstrations (Cai et al., 2023b), or multi-model instructions (Brohan et al., 2023a,b; Cai et al., 2024)). However, much like large language models, these instruction-following policies are highly susceptible to the selection of "prompts"(Kim et al., 2024; Lifshitz et al., 2024; Wang et al., 2023a,b). Researchers rely on trial and error to find the optimal prompt manually, and sometimes the quality of prompts doesn't align with human judgment. For instance, OpenVLA (Kim et al., 2024) shows a large performance gap when using "Pepsi can" compared to "Pepsi" as the prompt; for the same task of collecting wood logs, GROOT's performance varies significantly depending on the reference video used. Moreover, it is unclear whether an agent's failure to complete a task is due to the foundation policy's inherent limitations or the lack of a suitable prompt. A common viewpoint from the LLM community thinks that most of the abilities are learned from the pre-training phase (Ouyang et al., 2022; Zhao et al., 2023a), while post-training is a method to
arXiv.org Artificial Intelligence
Dec-2-2024
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.30)
- Technology: