real world environment
Generative Visual Foresight Meets Task-Agnostic Pose Estimation in Robotic Table-Top Manipulation
Zhang, Chuye, Zhang, Xiaoxiong, Pan, Wei, Zheng, Linfang, Zhang, Wei
Robotic manipulation in unstructured environments requires systems that can generalize across diverse tasks while maintaining robust and reliable performance. We introduce {GVF-TAPE}, a closed-loop framework that combines generative visual foresight with task-agnostic pose estimation to enable scalable robotic manipulation. GVF-TAPE employs a generative video model to predict future RGB-D frames from a single side-view RGB image and a task description, offering visual plans that guide robot actions. A decoupled pose estimation model then extracts end-effector poses from the predicted frames, translating them into executable commands via low-level controllers. By iteratively integrating video foresight and pose estimation in a closed loop, GVF-TAPE achieves real-time, adaptive manipulation across a broad range of tasks. Extensive experiments in both simulation and real-world settings demonstrate that our approach reduces reliance on task-specific action data and generalizes effectively, providing a practical and scalable solution for intelligent robotic systems.
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > China > Hong Kong (0.04)
ViSTa Dataset: Do vision-language models understand sequential tasks?
Wybitul, Evžen, Gunter, Evan Ryan, Seleznyov, Mikhail, Lindner, David
Using vision-language models (VLMs) as reward models in reinforcement learning holds promise for reducing costs and improving safety. So far, VLM reward models have only been used for goal-oriented tasks, where the agent must reach a particular final outcome. We explore VLMs' potential to supervise tasks that cannot be scored by the final state alone. To this end, we introduce ViSTa, a dataset for evaluating Vision-based understanding of Sequential Tasks. ViSTa comprises over 4,000 videos with step-by-step descriptions in virtual home, Minecraft, and real-world environments. Its novel hierarchical structure -- basic single-step tasks composed into more and more complex sequential tasks -- allows a fine-grained understanding of how well VLMs can judge tasks with varying complexity. To illustrate this, we use ViSTa to evaluate state-of-the-art VLMs, including CLIP, ViCLIP, and GPT-4o. We find that, while they are all good at object recognition, they fail to understand sequential tasks, with only GPT-4o achieving non-trivial performance.
Reviews: Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias
In this paper a new dataset for robot grasping task is proposed. Compared to grasping data collected in a lab environment, the authors propose to collect the data from real world environments (homes). To collect data in the wild, the authors propose to use cheap robots (measured by the cost) with low DoF. In order to compensate the noisy behavior of the less calibrated robots, the authors model the noise as a latent variable and jointly learn it with the grasping task. Results show that the combination of the aforementioned ideas result in a robot grasping model that can work well on both lab environments, and new real world environment.
The One [Simple] Method AI Implementers Use For Success
Who do you blame when AI projects fail? The data? Certainly you can put blame on solving the wrong problem with AI, or applying AI when you don't need AI at all. But what happens when you have a very well-suited application for AI and the project still fails? Sometimes it comes down to a simple approach: don't take so long. At a recent Enterprise Data & AI event, a presenter shared that their AI projects take on average 18 to 24 months to go from concept to production.
The ingredients of real world robotic reinforcement learning
Robots have been useful in environments that can be carefully controlled, such as those commonly found in industrial settings (e.g. assembly lines). However, in unstructured settings like the home, we need robotic systems that are adaptive to the diversity of the real world. Learning-based algorithms have the potential to enable robots to acquire complex behaviors adaptively in unstructured environments, by leveraging data collected from the environment. In particular, with reinforcement learning, robots learn novel behaviors through trial and error interactions. This is particularly important as we deploy robots in scenarios where the environment may not be known.
At A Glance – Embodied AI - Disruption Hub
Embodied AI is one of many terms associated with the relentless development of Artificial Intelligence. As the name suggests, it involves equipping software with a physical body and exploring how that body fits into real world environments. Embodied AI is based on embodied cognition – the idea that intelligence is as much a part of the body as the brain. By applying this logic to artificially intelligent systems, researchers hope to improve their functionality. Process automation, chatbots, advanced robotics, autonomous drive technology, and personal companions like Buddy and Jibo could all benefit from embodied intelligence.
Virtual Reality is the Next Training Ground for Artificial Intelligence
Virtual reality was imagined as a human simulation technology long before the most recent wave of innovation that brought us the Oculus RIFT and the wave of innovation that followed. Now, rendering high framerate graphics using multiple, stereoscopic points in virtual reality is matching the speed and accuracy of robotic sensors and cameras. By modeling physics, motion, and material interactions, virtual reality is poised to become a simulation tool for training automatons - robots, drones, and diagnostic gear - before they need to perform in the real world. Recent advancements point to a potentially disruptive combination of virtual reality and artificial intelligence which will unlock a future with safe and competent intelligent machines, able to learn exponentially through self training and intelligent, realistic simulations. Ongoing academic work in machine learning and virtual reality have been migrating to corporations and startups through open source initiatives and movement of skilled people through the academic, startup, and corporate workplaces.
- Information Technology (1.00)
- Leisure & Entertainment > Games > Computer Games (0.52)
Discovering Patterns of Autistic Planning
Galitsky, Boris (University of Girona) | Jarrold, William (University of California, Davis)
We analyze the patterns of autistic reasoning while performing planning tasks. The formalism of non-monotonic logic of defaults is used to simulate the autistic decision-making while adjusting an action to a context. Our current main finding is that while people with autism may be able to process single default rules, they have a characteristic difficulty in cases where multiple default rules conflict. Even though default reasoning was intended to simulate the reasoning of typical human subjects, it turns out that following the operational semantics of default reasoning in a literal way leads to the peculiarities of autistic behavior observed in the literature.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Wisconsin (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (4 more...)