Goto

Collaborating Authors

 Agents




Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents Zihao Wang

Neural Information Processing Systems

These additional behavior tokens will be augmented to the vocabulary of pretrained Multimodal Language Models. With this encoder, we then pack long-term multimodal interactions involving task instructions, memories, thoughts, observations, textual responses, behavior trajectories, etc .







a3621ee907def47c1b952ade25c67698-Paper-Conference.pdf

Neural Information Processing Systems

This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents, and provides insight into their "cognitive" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named roleplaying .