AITopics | Jatavallabhula, Krishna Murthy

Collaborating Authors

Jatavallabhula, Krishna Murthy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ALT-Pilot: Autonomous navigation with Language augmented Topometric maps

Omama, Mohammad, Inani, Pranav, Paul, Pranjal, Yellapragada, Sarat Chandra, Jatavallabhula, Krishna Murthy, Chinchali, Sandeep, Krishna, Madhava

arXiv.org Artificial IntelligenceOct-3-2023

We present an autonomous navigation system that operates without assuming HD LiDAR maps of the environment. Our system, ALT-Pilot, relies only on publicly available road network information and a sparse (and noisy) set of crowdsourced language landmarks. With the help of onboard sensors and a language-augmented topometric map, ALT-Pilot autonomously pilots the vehicle to any destination on the road network. We achieve this by leveraging vision-language models pre-trained on web-scale data to identify potential landmarks in a scene, incorporating vision-language features into the recursive Bayesian state estimation stack to generate global (route) plans, and a reactive trajectory planner and controller operating in the vehicle frame. We implement and evaluate ALT-Pilot in simulation and on a real, full-scale autonomous vehicle and report improvements over state-of-the-art topometric navigation systems by a factor of 3x on localization accuracy and 5x on goal reachability

artificial intelligence, language augmented topometric map, natural language, (2 more...)

arXiv.org Artificial Intelligence

2310.02324

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.53)

Add feedback

ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

Gu, Qiao, Kuwajerwala, Alihusein, Morin, Sacha, Jatavallabhula, Krishna Murthy, Sen, Bipasha, Agarwal, Aditya, Rivera, Corban, Paul, William, Ellis, Kirsty, Chellappa, Rama, Gan, Chuang, de Melo, Celso Miguel, Tenenbaum, Joshua B., Torralba, Antonio, Shkurti, Florian, Paull, Liam

arXiv.org Artificial IntelligenceSep-28-2023

For robots to perform a wide variety of tasks, they require a 3D representation of the world that is semantically rich, yet compact and efficient for task-driven perception and planning. Recent approaches have attempted to leverage features from large vision-language models to encode semantics in 3D representations. However, these approaches tend to produce maps with per-point feature vectors, which do not scale well in larger environments, nor do they contain semantic spatial relationships between entities in the environment, which are useful for downstream planning. In this work, we propose ConceptGraphs, an open-vocabulary graph-structured representation for 3D scenes. ConceptGraphs is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association. The resulting representations generalize to novel semantic classes, without the need to collect large 3D datasets or finetune models. We demonstrate the utility of this representation through a number of downstream planning tasks that are specified through abstract (language) prompts and require complex reasoning over spatial and semantic concepts. (Project page: https://concept-graphs.github.io/ Explainer video: https://youtu.be/mRhNkQwRYnc )

natural language, perception and planning, text processing, (3 more...)

arXiv.org Artificial Intelligence

2309.1665

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.53)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Tactile Estimation of Extrinsic Contact Patch for Stable Placement

Ota, Kei, Jha, Devesh K., Jatavallabhula, Krishna Murthy, Kanezaki, Asako, Tenenbaum, Joshua B.

arXiv.org Artificial IntelligenceSep-25-2023

Precise perception of contact interactions is essential for the fine-grained manipulation skills for robots. In this paper, we present the design of feedback skills for robots that must learn to stack complex-shaped objects on top of each other. To design such a system, a robot should be able to reason about the stability of placement from very gentle contact interactions. Our results demonstrate that it is possible to infer the stability of object placement based on tactile readings during contact formation between the object and its environment. In particular, we estimate the contact patch between a grasped object and its environment using force and tactile observations to estimate the stability of the object during a contact formation. The contact patch could be used to estimate the stability of the object upon the release of the grasp. The proposed method is demonstrated on various pairs of objects that are used in a very popular board game.

artificial intelligence, extrinsic contact patch, tactile estimation, (1 more...)

arXiv.org Artificial Intelligence

2309.14552

Genre: Research Report > New Finding (0.53)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Follow Anything: Open-set detection, tracking, and following in real-time

Maalouf, Alaa, Jadhav, Ninad, Jatavallabhula, Krishna Murthy, Chahine, Makram, Vogt, Daniel M., Wood, Robert J., Torralba, Antonio, Rus, Daniela

arXiv.org Artificial IntelligenceAug-10-2023

Tracking and following objects of interest is critical to several robotics use cases, ranging from industrial automation to logistics and warehousing, to healthcare and security. In this paper, we present a robotic system to detect, track, and follow any object in real-time. Our approach, dubbed ``follow anything'' (FAn), is an open-vocabulary and multimodal model -- it is not restricted to concepts seen at training time and can be applied to novel classes at inference time using text, images, or click queries. Leveraging rich visual descriptors from large-scale pre-trained models (foundation models), FAn can detect and segment objects by matching multimodal queries (text, images, clicks) against an input image sequence. These detected and segmented objects are tracked across image frames, all while accounting for occlusion and object re-emergence. We demonstrate FAn on a real-world robotic system (a micro aerial vehicle) and report its ability to seamlessly follow the objects of interest in a real-time control loop. FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second. To enable rapid adoption, deployment, and extensibility, we open-source all our code on our project webpage at https://github.com/alaamaalouf/FollowAnything . We also encourage the reader the watch our 5-minutes explainer video in this https://www.youtube.com/watch?v=6Mgt3EPytrw .

machine learning, natural language, real time system, (18 more...)

arXiv.org Artificial Intelligence

2308.05737

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

CLIPGraphs: Multimodal Graph Networks to Infer Object-Room Affinities

Agrawal, Ayush, Arora, Raghav, Datta, Ahana, Banerjee, Snehasis, Bhowmick, Brojeshwar, Jatavallabhula, Krishna Murthy, Sridharan, Mohan, Krishna, Madhava

arXiv.org Artificial IntelligenceJun-2-2023

This paper introduces a novel method for determining the best room to place an object in, for embodied scene rearrangement. While state-of-the-art approaches rely on large language models (LLMs) or reinforcement learned (RL) policies for this task, our approach, CLIPGraphs, efficiently combines commonsense domain knowledge, data-driven methods, and recent advances in multimodal learning. Specifically, it (a)encodes a knowledge graph of prior human preferences about the room location of different objects in home environments, (b) incorporates vision-language features to support multimodal queries based on images or text, and (c) uses a graph network to learn object-room affinities based on embeddings of the prior knowledge and the vision-language features. We demonstrate that our approach provides better estimates of the most appropriate location of objects from a benchmark set of object categories in comparison with state-of-the-art baselines

artificial intelligence, knowledge graph, natural language, (13 more...)

arXiv.org Artificial Intelligence

2306.0154

Genre: Research Report > Promising Solution (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

PAC-NeRF: Physics Augmented Continuum Neural Radiance Fields for Geometry-Agnostic System Identification

Li, Xuan, Qiao, Yi-Ling, Chen, Peter Yichen, Jatavallabhula, Krishna Murthy, Lin, Ming, Jiang, Chenfanfu, Gan, Chuang

arXiv.org Artificial IntelligenceMar-9-2023

Existing approaches to system identification (estimating the physical parameters of an object) from videos assume known object geometries. This precludes their applicability in a vast majority of scenes where object geometries are complex or unknown. In this work, we aim to identify parameters characterizing a physical system from a set of multi-view videos without any assumption on object geometry or topology. To this end, we propose "Physics Augmented Continuum Neural Radiance Fields" (PAC-NeRF), to estimate both the unknown geometry and physical parameters of highly dynamic objects from multi-view videos. We design PAC-NeRF to only ever produce physically plausible states by enforcing the neural radiance field to follow the conservation laws of continuum mechanics. For this, we design a hybrid Eulerian-Lagrangian representation of the neural radiance field, i.e., we use the Eulerian grid representation for NeRF density and color fields, while advecting the neural radiance fields via Lagrangian particles. This hybrid Eulerian-Lagrangian representation seamlessly blends efficient neural rendering with the material point method (MPM) for robust differentiable physics simulation. We validate the effectiveness of our proposed framework on geometry and physical parameter estimation over a vast range of materials, including elastic bodies, plasticine, sand, Newtonian and non-Newtonian fluids, and demonstrate significant performance gain on most tasks.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2303.05512

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.89)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Rethinking Optimization with Differentiable Simulation from a Global Perspective

Antonova, Rika, Yang, Jingyun, Jatavallabhula, Krishna Murthy, Bohg, Jeannette

arXiv.org Machine LearningJun-28-2022

Differentiable simulation is a promising toolkit for fast gradient-based policy optimization and system identification. However, existing approaches to differentiable simulation have largely tackled scenarios where obtaining smooth gradients has been relatively easy, such as systems with mostly smooth dynamics. In this work, we study the challenges that differentiable simulation presents when it is not feasible to expect that a single descent reaches a global optimum, which is often a problem in contact-rich scenarios. We analyze the optimization landscapes of diverse scenarios that contain both rigid bodies and deformable objects. In dynamic environments with highly deformable objects and fluids, differentiable simulators produce rugged landscapes with nonetheless useful gradients in some parts of the space. We propose a method that combines Bayesian optimization with semi-local 'leaps' to obtain a global search method that can use gradients effectively, while also maintaining robust performance in regions with noisy gradients. We show that our approach outperforms several gradient-based and gradient-free baselines on an extensive set of experiments in simulation, and also validate the method using experiments with a real robot and deformables. Videos and supplementary materials are available at https://tinyurl.com/globdiff

artificial intelligence, gradient, machine learning, (15 more...)

arXiv.org Machine Learning

2207.00167

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.89)

Add feedback

gradSim: Differentiable simulation for system identification and visuomotor control

Jatavallabhula, Krishna Murthy, Macklin, Miles, Golemo, Florian, Voleti, Vikram, Petrini, Linda, Weiss, Martin, Considine, Breandan, Parent-Levesque, Jerome, Xie, Kevin, Erleben, Kenny, Paull, Liam, Shkurti, Florian, Nowrouzezahrai, Derek, Fidler, Sanja

arXiv.org Artificial IntelligenceApr-6-2021

We consider the problem of estimating an object's physical properties such as mass, friction, and elasticity directly from video sequences. Such a system identification problem is fundamentally ill-posed due to the loss of information during image formation. Current solutions require precise 3D labels which are labor-intensive to gather, and infeasible to create for many systems such as deformable solids or cloth. We present gradSim, a framework that overcomes the dependence on 3D supervision by leveraging differentiable multiphysics simulation and differentiable rendering to jointly model the evolution of scene dynamics and image formation. This novel combination enables backpropagation from pixels in a video sequence through to the underlying physical attributes that generated them. Moreover, our unified computation graph -- spanning from the dynamics and through the rendering process -- enables learning in challenging visuomotor control tasks, without relying on state-based (3D) supervision, while obtaining performance competitive to or better than techniques that rely on precise 3D labels.

deep learning, international conference, upstream oil & gas, (17 more...)

arXiv.org Artificial Intelligence

2104.02646

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

DRACO: Weakly Supervised Dense Reconstruction And Canonicalization of Objects

Sajnani, Rahul, Sanchawala, AadilMehdi, Jatavallabhula, Krishna Murthy, Sridhar, Srinath, Krishna, K. Madhava

arXiv.org Artificial IntelligenceNov-25-2020

We present DRACO, a method for Dense Reconstruction And Canonicalization of Object shape from one or more RGB images. Canonical shape reconstruction, estimating 3D object shape in a coordinate space canonicalized for scale, rotation, and translation parameters, is an emerging paradigm that holds promise for a multitude of robotic applications. Prior approaches either rely on painstakingly gathered dense 3D supervision, or produce only sparse canonical representations, limiting real-world applicability. DRACO performs dense canonicalization using only weak supervision in the form of camera poses and semantic keypoints at train time. During inference, DRACO predicts dense object-centric depth maps in a canonical coordinate-space, solely using one or more RGB images of an object. Extensive experiments on canonical shape reconstruction and pose estimation show that DRACO is competitive or superior to fully-supervised methods.

artificial intelligence, canonicalization, video understanding, (17 more...)

arXiv.org Artificial Intelligence

2011.12912

Country:

North America > United States (0.14)
North America > Canada (0.14)
Asia > India (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback