Goto

Collaborating Authors

 gondola


Gondola: Grounded Vision Language Planning for Generalizable Robotic Manipulation

Chen, Shizhe, Garcia, Ricardo, Pacaud, Paul, Schmid, Cordelia

arXiv.org Artificial Intelligence

Robotic manipulation faces a significant challenge in generalizing across unseen objects, environments and tasks specified by diverse language instructions. To improve generalization capabilities, recent research has incorporated large language models (LLMs) for planning and action execution. While promising, these methods often fall short in generating grounded plans in visual environments. Although efforts have been made to perform visual instructional tuning on LLMs for robotic manipulation, existing methods are typically constrained by single-view image input and struggle with precise object grounding. In this work, we introduce Gondola, a novel grounded vision-language planning model based on LLMs for generalizable robotic manipulation. Gondola takes multi-view images and history plans to produce the next action plan with interleaved texts and segmentation masks of target objects and locations. To support the training of Gondola, we construct three types of datasets using the RLBench simulator, namely robot grounded planning, multi-view referring expression and pseudo long-horizon task datasets. Gondola outperforms the state-of-the-art LLM-based method across all four generalization levels of the GemBench dataset, including novel placements, rigid objects, articulated objects and long-horizon tasks.


Blimp-based Crime Scene Analysis

Cooney, Martin, Alonso-Fernandez, Fernando

arXiv.org Artificial Intelligence

Crime is a critical problem -- which often takes place behind closed doors, posing additional difficulties for investigators. To bring hidden truths to light, evidence at indoor crime scenes must be documented before any contamination or degradation occurs. Here, we address this challenge from the perspective of artificial intelligence (AI), computer vision, and robotics: Specifically, we explore the use of a blimp as a "floating camera" to drift over and record evidence with minimal disturbance. Adopting a rapid prototyping approach, we develop a proof-of-concept to investigate capabilities required for manual or semi-autonomous operation. Consequently, our results demonstrate the feasibility of equipping indoor blimps with various components (such as RGB and thermal cameras, LiDARs, and WiFi, with 20 minutes of battery life). Moreover, we confirm the core premise: that such blimps can be used to observe crime scene evidence while generating little airflow. We conclude by proposing some ideas related to detection (e.g., of bloodstains), mapping, and path planning, with the aim of stimulating further discussion and exploration.


Flight Demonstration and Model Validation of a Prototype Variable-Altitude Venus Aerobot

Izraelevitz, Jacob S., Krishnamoorthy, Siddharth, Goel, Ashish, Turner, Caleb, Aiazzi, Carolina, Pauken, Michael, Carlson, Kevin, Walsh, Gerald, Leake, Carl, Quintana, Carlos, Lim, Christopher, Jain, Abhi, Dorsky, Leonard, Baines, Kevin, Cutts, James, Byrne, Paul K., Lachenmeier, Tim, Hall, Jeffery L.

arXiv.org Artificial Intelligence

This paper details a significant milestone towards maturing a buoyant aerial robotic platform, or aerobot, for flight in the Venus clouds. We describe two flights of our subscale altitude-controlled aerobot, fabricated from the materials necessary to survive Venus conditions. During these flights over the Nevada Black Rock desert, the prototype flew at the identical atmospheric densities as 54 to 55 km cloud layer altitudes on Venus. We further describe a first-principle aerobot dynamics model which we validate against the Nevada flight data and subsequently employ to predict the performance of future aerobots on Venus. The aerobot discussed in this paper is under JPL development for an in-situ mission flying multiple circumnavigations of Venus, sampling the chemical and physical properties of the planet's atmosphere and also remotely sensing surface properties.