Goto

Collaborating Authors

 Spjut, Josef


GRS: Generating Robotic Simulation Tasks from Real-World Images

arXiv.org Artificial Intelligence

We introduce GRS (Generating Robotic Simulation tasks), a novel system to address the challenge of real-to-sim in robotics, computer vision, and AR/VR. GRS enables the creation of digital twin simulations from single real-world RGB-D observations, complete with diverse, solvable tasks for virtual agent training. We use state-of-the-art vision-language models (VLMs) to achieve a comprehensive real-to-sim pipeline. GRS operates in three stages: 1) scene comprehension using SAM2 for object segmentation and VLMs for object description, 2) matching identified objects with simulation-ready assets, and 3) generating contextually appropriate robotic tasks. Our approach ensures simulations align with task specifications by generating test suites designed to verify adherence to the task specification. We introduce a router that iteratively refines the simulation and test code to ensure the simulation is solvable by a robot policy while remaining aligned to the task specification. Our experiments demonstrate the system's efficacy in accurately identifying object correspondence, which allows us to generate task environments that closely match input environments, and enhance automated simulation task generation through our novel router mechanism.


A Deformable Interface for Human Touch Recognition using Stretchable Carbon Nanotube Dielectric Elastomer Sensors and Deep Neural Networks

arXiv.org Machine Learning

User interfaces provide an interactive window between physical and virtual environments. A new concept in the field of human-computer interaction is a soft user interface; a compliant surface that facilitates touch interaction through deformation. Despite the potential of these interfaces, they currently lack a signal processing framework that can efficiently extract information from their deformation. Here we present OrbTouch, a device that uses statistical learning algorithms, based on convolutional neural networks, to map deformations from human touch to categorical labels (i.e., gestures) and touch location using stretchable capacitor signals as inputs. We demonstrate this approach by using the device to control the popular game Tetris. OrbTouch provides a modular, robust framework to interpret deformation in soft media, laying a foundation for new modes of human computer interaction through shape changing solids.