Goto

Collaborating Authors

 johnson-roberson


Joint Flow Trajectory Optimization For Feasible Robot Motion Generation from Video Demonstrations

Dong, Xiaoxiang, Johnson-Roberson, Matthew, Zhi, Weiming

arXiv.org Artificial Intelligence

Learning from human video demonstrations offers a scalable alternative to teleoperation or kinesthetic teaching, but poses challenges for robot manipulators due to embodiment differences and joint feasibility constraints. We address this problem by proposing the Joint Flow Trajectory Optimization (JFTO) framework for grasp pose generation and object trajectory imitation under the video-based Learning-from-Demonstration (LfD) paradigm. Rather than directly imitating human hand motions, our method treats demonstrations as object-centric guides, balancing three objectives: (i) selecting a feasible grasp pose, (ii) generating object trajectories consistent with demonstrated motions, and (iii) ensuring collision-free execution within robot kinematics. To capture the multimodal nature of demonstrations, we extend flow matching to $\SE(3)$ for probabilistic modeling of object trajectories, enabling density-aware imitation that avoids mode collapse. The resulting optimization integrates grasp similarity, trajectory likelihood, and collision penalties into a unified differentiable objective. We validate our approach in both simulation and real-world experiments across diverse real-world manipulation tasks.


Efficient Construction of Implicit Surface Models From a Single Image for Motion Generation

Chu, Wei-Teng, Zhang, Tianyi, Johnson-Roberson, Matthew, Zhi, Weiming

arXiv.org Artificial Intelligence

Implicit representations have been widely applied in robotics for obstacle avoidance and path planning. In this paper, we explore the problem of constructing an implicit distance representation from a single image. Past methods for implicit surface reconstruction, such as \emph{NeuS} and its variants generally require a large set of multi-view images as input, and require long training times. In this work, we propose Fast Image-to-Neural Surface (FINS), a lightweight framework that can reconstruct high-fidelity surfaces and SDF fields based on a single or a small set of images. FINS integrates a multi-resolution hash grid encoder with lightweight geometry and color heads, making the training via an approximate second-order optimizer highly efficient and capable of converging within a few seconds. Additionally, we achieve the construction of a neural surface requiring only a single RGB image, by leveraging pre-trained foundation models to estimate the geometry inherent in the image. Our experiments demonstrate that under the same conditions, our method outperforms state-of-the-art baselines in both convergence speed and accuracy on surface reconstruction and SDF field estimation. Moreover, we demonstrate the applicability of FINS for robot surface following tasks and show its scalability to a variety of benchmark datasets.


Cross-Modal Instructions for Robot Motion Generation

Barron, William, Dong, Xiaoxiang, Johnson-Roberson, Matthew, Zhi, Weiming

arXiv.org Artificial Intelligence

Teaching robots novel behaviors typically requires motion demonstrations via teleoperation or kinaesthetic teaching, that is, physically guiding the robot. While recent work has explored using human sketches to specify desired behaviors, data collection remains cumbersome, and demonstration datasets are difficult to scale. In this paper, we introduce an alternative paradigm, Learning from Cross-Modal Instructions, where robots are shaped by demonstrations in the form of rough annotations, which can contain free-form text labels, and are used in lieu of physical motion. We introduce the CrossInstruct framework, which integrates cross-modal instructions as examples into the context input to a foundational vision-language model (VLM). The VLM then iteratively queries a smaller, fine-tuned model, and synthesizes the desired motion over multiple 2D views. These are then subsequently fused into a coherent distribution over 3D motion trajectories in the robot's workspace. By incorporating the reasoning of the large VLM with a fine-grained pointing model, CrossInstruct produces executable robot behaviors that generalize beyond the environment of in the limited set of instruction examples. We then introduce a downstream reinforcement learning pipeline that leverages CrossInstruct outputs to efficiently learn policies to complete fine-grained tasks. We rigorously evaluate CrossInstruct on benchmark simulation tasks and real hardware, demonstrating effectiveness without additional fine-tuning and providing a strong initialization for policies subsequently refined via reinforcement learning.


Infinite Leagues Under the Sea: Photorealistic 3D Underwater Terrain Generation by Latent Fractal Diffusion Models

Zhang, Tianyi, Zhi, Weiming, Mangelson, Joshua, Johnson-Roberson, Matthew

arXiv.org Artificial Intelligence

This paper tackles the problem of generating representations of underwater 3D terrain. Off-the-shelf generative models, trained on Internet-scale data but not on specialized underwater images, exhibit downgraded realism, as images of the seafloor are relatively uncommon. To this end, we introduce DreamSea, a generative model to generate hyper-realistic underwater scenes. DreamSea is trained on real-world image databases collected from underwater robot surveys. Images from these surveys contain massive real seafloor observations and covering large areas, but are prone to noise and artifacts from the real world. We extract 3D geometry and semantics from the data with visual foundation models, and train a diffusion model that generates realistic seafloor images in RGBD channels, conditioned on novel fractal distribution-based latent embeddings. We then fuse the generated images into a 3D map, building a 3DGS model supervised by 2D diffusion priors which allows photorealistic novel view rendering. DreamSea is rigorously evaluated, demonstrating the ability to robustly generate large-scale underwater scenes that are consistent, diverse, and photorealistic. Our work drives impact in multiple domains, spanning filming, gaming, and robot simulation.


Raise and deliver – TechCrunch

#artificialintelligence

Quadrupeds are leaving the lab and entering the workplace, and the ongoing labor shortages plaguing many industries has only intensified the need. We've seen strong interest for Spot around industrial use cases where mobile robots can navigate worksites that include stairs, doors and other obstacles that would foil wheeled or tracked robots. Our customers are using Spot as a dynamic sensing platform to collect reliable, repeatable data around their sites for tasks like thermal anomaly detection in industrial manufacturing, radiation mapping in nuclear facilities and digital twin modeling on construction sites. Products like Spot are proving they can add real value in the real world. What will 2022 bring for these categories?


TechCrunch

#artificialintelligence

The prospect of truly zero contact delivery seems closer -- and more important -- than ever with the pandemic changing how we think of last mile logistics. Autonomous delivery executives from FedEx, Postmates, and Refraction AI joined us to talk about the emerging field at TechCrunch Mobility 2020. FedEx VP of Advanced Technology and Innovation Rebecca Yeung explained why the logistics giant felt that it was time to double down on its experiments in the area of autonomy. "COVID brought the term'contactless' -- before that not many people are talking about contactless; Now it's almost a preferred way of us delivering," she said. "So we see, from government to consumers, open mindedness about, maybe in the future you would have everything delivered to you through autonomous means, and that's the preferred way."


The autonomous delivery bot that's designed to be 'nimble and fast enough' to ride in the bike lane

Daily Mail - Science & tech

Autonomous robots could soon be ferrying deliveries alongside human messengers in your city's bike lane. Refraction AI has unveiled a 5-foot-tall delivery robot dubbed REV-1 that can zip around at speeds of up to 15 miles per hour on its three wheels. It can carry the equivalent of about four or five grocery bags in its cabin, according to the firm. The company says its lightweight, nimble design will allow it to operate in both the bike lane and the roadway, making for more efficient last-mile delivery options. 'We have created the Goldilocks of autonomous vehicles in terms of size and shape,' Matthew Johnson-Roberson, cofounder and CEO at Refraction, said in a statement when the bot launched this month at TechCrunch Mobility.


Teaching self-driving cars to predict pedestrian movement

#artificialintelligence

Data collected by vehicles through cameras, LiDAR and GPS allow the researchers to capture video snippets of humans in motion and then recreate them in 3D computer simulation. With that, they've created a "biomechanically inspired recurrent neural network" that catalogs human movements. With it, they can predict poses and future locations for one or several pedestrians up to about 50 yards from the vehicle. "Prior work in this area has typically only looked at still images. It wasn't really concerned with how people move in three dimensions," said Ram Vasudevan, U-M assistant professor of mechanical engineering.


Waymo's So-Called Robo-Taxi Launch Reveals a Brutal Truth

WIRED

Waymo, the frontrunner in the self-driving car industry, today announces the moment everyone has been waiting for: It is officially "launching" a robo-taxi service in Chandler, Arizona, wherein riders will use an app to hail the vehicles to take them anywhere in an 80 to 100 square mile area, for a price. "Today, we're taking the next step in our journey with the introduction of our commercial self-driving service, Waymo One," Waymo CEO John Krafcik wrote in a blog post. The banner Waymo is unfurling, though, is tattered by caveats. Waymo One will only be available to the 400 or so people already enrolled in Waymo's early rider program, which has been running in the calm, sunny Phoenix suburb of Chandler for about 18 months. More glaringly, the cars will have a human behind the wheel, there to take control in case the car does something it shouldn't.


Preparing Self-Driving Cars for the Wild World of Developing Cities

WIRED

Self-driving cars are no longer confined to controlled test tracks or even to placid suburban streets--they're tackling real traffic in US cities such as New York, San Francisco, and Pittsburgh. Learning how to drive in places like unruly Boston, a land of creative left turns and seemingly optional yields, comes with its challenges. Even Patriots fans look like goody two-shoes compared to drivers who have little to zero respect for lanes, traffic signals, warning signs, and speed limits. On wide roads without lanes and huge, anarchic intersections all over the world, human interaction dictates traffic flows, with each driver adjusting to others' maneuvers on the spot, regardless of what the rule book says. These informal systems work for the most part, but they come at a high cost.