Goto

Collaborating Authors

 Bhagat, Sarthak


WROOM: An Autonomous Driving Approach for Off-Road Navigation

arXiv.org Artificial Intelligence

Off-road navigation is a challenging problem both at the planning level to get a smooth trajectory and at the control level to avoid flipping over, hitting obstacles, or getting stuck at a rough patch. There have been several recent works using classical approaches involving depth map prediction followed by smooth trajectory planning and using a controller to track it. We design an end-to-end reinforcement learning (RL) system for an autonomous vehicle in off-road environments using a custom-designed simulator in the Unity game engine. We warm-start the agent by imitating a rule-based controller and utilize Proximal Policy Optimization (PPO) to improve the policy based on a reward that incorporates Control Barrier Functions (CBF), facilitating the agent's ability to generalize effectively to real-world scenarios. The training involves agents concurrently undergoing domain-randomized trials in various environments. We also propose a novel simulation environment to replicate off-road driving scenarios and deploy our proposed approach on a real buggy RC car. Videos and additional results: https://sites.google.com/view/wroom-utd/home


ShapeGrasp: Zero-Shot Task-Oriented Grasping with Large Language Models through Geometric Decomposition

arXiv.org Artificial Intelligence

Task-oriented grasping of unfamiliar objects is a necessary skill for robots in dynamic in-home environments. Inspired by the human capability to grasp such objects through intuition about their shape and structure, we present a novel zero-shot task-oriented grasping method leveraging a geometric decomposition of the target object into simple, convex shapes that we represent in a graph structure, including geometric attributes and spatial relationships. Our approach employs minimal essential information - the object's name and the intended task - to facilitate zero-shot task-oriented grasping. We utilize the commonsense reasoning capabilities of large language models to dynamically assign semantic meaning to each decomposed part and subsequently reason over the utility of each part for the intended task. Through extensive experiments on a real-world robotics platform, we demonstrate that our grasping approach's decomposition and reasoning pipeline is capable of selecting the correct part in 92% of the cases and successfully grasping the object in 82% of the tasks we evaluate. Additional videos, experiments, code, and data are available on our project website: https://shapegrasp.github.io/.


Knowledge-Guided Short-Context Action Anticipation in Human-Centric Videos

arXiv.org Artificial Intelligence

This allows us to predict future actions accurately, video understanding, including video production and editing particularly from short-horizon observations - a key aspect [20, 41, 6, 38]. This work focuses on anticipating actions that prior works [11, 1, 22, 34, 2, 17] in action anticipation from short video segments and provides potential avenues fail to cater to. to enhance the editing process. In particular, the ability In our work, we utilize Knowledge Graphs (KG) to capture to extract actions from a video segment can be utilized the relationship between entities present in the video in two manners: 1) It allows for intelligent clip suggestions and link them to their respective affordances and the potential for future editing, namely the ability to suggest videos given tools that could be used to afford them in a particular what will likely happen next, and 2) it provides information way. Prior work [40, 26, 23, 15] has introduced efficient on what generally would happen, which allows editors to methods of identifying such relationships, which can refine their composition to either confirm or contradict a subsequently be utilized to identify the potential for certain viewer's expectation.


Sample-Efficient Learning of Novel Visual Concepts

arXiv.org Artificial Intelligence

Despite the advances made in visual object recognition, state-of-the-art deep learning models struggle to effectively recognize novel objects in a few-shot setting where only a limited number of examples are provided. Unlike humans who excel at such tasks, these models often fail to leverage known relationships between entities in order to draw conclusions about such objects. In this work, we show that incorporating a symbolic knowledge graph into a state-of-the-art recognition model enables a new approach for effective few-shot classification. In our proposed neuro-symbolic architecture and training methodology, the knowledge graph is augmented with additional relationships extracted from a small set of examples, improving its ability to recognize novel objects by considering the presence of interconnected entities. Unlike existing few-shot classifiers, we show that this enables our model to incorporate not only objects but also abstract concepts and affordances. The existence of the knowledge graph also makes this approach amenable to interpretability through analysis of the relationships contained within it. We empirically show that our approach outperforms current state-of-the-art few-shot multi-label classification methods on the COCO dataset and evaluate the addition of abstract concepts and affordances on the Visual Genome dataset.