Goto

Collaborating Authors

 S, Arjun P


SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching

arXiv.org Artificial Intelligence

Experience Goal Visual Rearrangement task stands as a However, these methods have disadvantages: 2D and 3D foundational challenge within Embodied AI, requiring an semantic maps store object pose and semantic information agent to construct a robust world model that accurately in a grid; this approach provides limited resolution, does captures the goal state. The agent uses this world model to not inherently capture interactions between objects and is restore a shuffled scene to its original configuration, making prone to sensitivity issues and quantization errors. Although an accurate representation of the world essential for pointcloud based representation can provide more robustness successfully completing the task. In this work, we present to sensitivity, it lacks structural semantics: identifying a novel framework that leverages on 3D Gaussian Splatting objects and their interactions with the world in a noisy as a 3D scene representation for experience goal visual pointcloud is challenging. Scene graph based methods often rearrangement task. Recent advances in volumetric assume a clear and well defined relationship between scene representation like 3D Gaussian Splatting, offer fast objects, which often limits the granularity of scene understanding, rendering of high quality and photo-realistic novel views.


Exploring Unseen Environments with Robots using Large Language and Vision Models through a Procedurally Generated 3D Scene Representation

arXiv.org Artificial Intelligence

Recent advancements in Generative Artificial Intelligence, particularly in the realm of Large Language Models (LLMs) and Large Vision Language Models (LVLMs), have enabled the prospect of leveraging cognitive planners within robotic systems. This work focuses on solving the object goal navigation problem by mimicking human cognition to attend, perceive and store task specific information and generate plans with the same. We introduce a comprehensive framework capable of exploring an unfamiliar environment in search of an object by leveraging the capabilities of Large Language Models(LLMs) and Large Vision Language Models (LVLMs) in understanding the underlying semantics of our world. A challenging task in using LLMs to generate high level sub-goals is to efficiently represent the environment around the robot. We propose to use a 3D scene modular representation, with semantically rich descriptions of the object, to provide the LLM with task relevant information. But providing the LLM with a mass of contextual information (rich 3D scene semantic representation), can lead to redundant and inefficient plans. We propose to use an LLM based pruner that leverages the capabilities of in-context learning to prune out irrelevant goal specific information.