DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

Li, Jianxiong, Zheng, Jinliang, Zheng, Yinan, Mao, Liyuan, Hu, Xiao, Cheng, Sijie, Niu, Haoyi, Liu, Jihao, Liu, Yu, Liu, Jingjing, Zhang, Ya-Qin, Zhan, Xianyuan

May-23-2024–arXiv.org Artificial Intelligence

Multimodal pretraining is an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progressions; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding. Most existing methods approach these via separate objectives, which often reach sub-optimal solutions. In this paper, we propose a universal unified objective that can simultaneously extract meaningful task progression information from image sequences and seamlessly align them with language instructions. We discover that via implicit preferences, where a visual trajectory inherently aligns better with its corresponding language instruction than mismatched pairs, the popular Bradley-Terry model can transform into representation learning through proper reward reparameterizations. The resulted framework, DecisionNCE, mirrors an InfoNCE-style objective but is distinctively tailored for decision-making tasks, providing an embodied representation learning framework that elegantly extracts both local and global task progression features, with temporal consistency enforced through implicit time contrastive learning, while ensuring trajectory-level instruction grounding via multimodal joint encoding. Evaluation on both simulated and real robots demonstrates that DecisionNCE effectively facilitates diverse downstream policy learning tasks, offering a versatile solution for unified representation and reward learning. Project Page: https://2toinf.github.io/DecisionNCE/

artificial intelligence, decisionnce, machine learning, (14 more...)

arXiv.org Artificial Intelligence

May-23-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)
- Europe > Austria
  - Vienna (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)
  - Robots (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found