Goto

Collaborating Authors

 avdc


Learning to Act from Actionless Videos through Dense Correspondences

arXiv.org Machine Learning

In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations. By synthesizing videos that "hallucinate" robot executing actions and in combination with dense correspondences between frames, our approach can infer the closed-formed action to execute to an environment without the need of any explicit action labels. This unique capability allows us to train the policy solely based on RGB videos and deploy learned policies to various robotic tasks. We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks. Additionally, we contribute an open-source framework for efficient video modeling, enabling the training of high-fidelity policy models with four GPUs within a single day. A goal of robot learning is to construct a policy that can successfully and robustly execute diverse tasks across various robots and environments. A major obstacle is the diversity present in different robotic tasks. The state representation necessary to fold a cloth differs substantially from the one needed for pouring water, picking and placing objects, or navigating, requiring a policy that can process each state representation that arises. Furthermore, the action representation to execute each task varies significantly subject to differences in motor actuation, gripper shape, and task goals, requiring a policy that can correctly deduce an action to execute across different robots and tasks. One approach to solve this issue is to use images as a task-agnostic method for encoding both the states and the actions to execute. In this setting, policy prediction involves synthesizing a video that depicts the actions a robot should execute (Finn & Levine, 2017; Kurutach et al., 2018; Du et al., 2023), enabling different states and actions to be encoded in a modality-agnostic manner. However, directly predicting an image representation a robot should execute does not explicitly encode the required robot actions to execute. To address this, past works either learn an action-specific video prediction model (Finn & Levine, 2017) or a task-specific inverse-dynamics model to predict actions from videos (Du et al., 2023). Both approaches rely on task-specific action labels which can be expensive to collect in practice, preventing general policy prediction across different robot tasks. This work presents a method that first synthesizes a video rendering the desired task execution; then, it directly regresses actions from the synthesized video without requiring any action labels or task-specific inverse-dynamics model, enabling us to directly formulate policy learning as a video generation problem.


Artificial intelligence is coming... to your council

#artificialintelligence

Artificial intelligence is coming to Aylesbury Vale... and being used by our council. AVDC say they're going to be using things like Amazon's new Echo system to help deliver services better. "So the idea with this is that people can talk to a device and actually engage directly with the council, rather than having to come in, ring us or do anything else. "So it's about using the clever things that are out there today to make things better." You won't be able to do this straight away, this is all part of a long-term plan. The five year plan called'Connected Knowledge', focuses on ensuring customers continue to have the best possible experience, by providing a digital programme that's more efficient, flexible and better value for money. AVDC was one of the first councils to adopt a cloud IT strategy, saving around £6 million. Connected Knowledge sees its digital systems evolve further, working towards fully integrated and connected transactions for customers. And through the use of Amazon Echo technology, developed with Arcus Global (AVDC's development partner), AVDC will become the first council in the UK to use AI (artificial intelligence) and AI powered voice control, to serve residents' needs. "Arcus Global is delighted to be collaborating with AVDC on the very first Amazon Echo integration with a local authority platform.