Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling

Liu, Yuejiang, Hamid, Jubayer Ibn, Xie, Annie, Lee, Yoonho, Du, Maximilian, Finn, Chelsea

arXiv.org Artificial Intelligence 

The increasing availability of human demonstrations has spurred renewed interest in behavioral cloning [1, 2]. In particular, recent studies have highlighted the potential of learning from large-scale demonstrations to acquire a variety of complex skills [3, 4, 5, 6, 7, 8]. However, this approach still struggles with two common properties of human demonstrations: (i) strong temporal dependencies across multiple steps, such as idle pauses [4] and latent strategies [9, 10], (ii) large style variability across different demonstrations, including differences in proficiency [11] and preference [12]. Oftentimes, both properties are prevalent yet unlabeled in collected data, posing significant challenges to traditional behavioral cloning, which typically learns a discriminative model to map an input state to a target action. In response to these challenges, recent works have pursued a generative approach characterized by two key elements: (i) predicting a sequence of actions over multiple time steps and executing all or part of the sequence, known as action chunking [3] or receding horizon [4]; (ii) modeling the distribution of action chunks and sampling from the learned model in an independent [4, 13] or weakly dependent [3, 14] manner during deployment. Some studies find these elements crucial for learning a performant policy in controlled laboratory scenarios [3, 4], while other recent work reports opposite outcomes under practical conditions [6]. The reasons behind these conflicting results remain unclear.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found