AITopics | Jatavallabhula, Krishna Murthy

Collaborating Authors

Jatavallabhula, Krishna Murthy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Anticipate & Act : Integrating LLMs and Classical Planning for Efficient Task Execution in Household Environments

Arora, Raghav, Singh, Shivam, Swaminathan, Karthik, Datta, Ahana, Banerjee, Snehasis, Bhowmick, Brojeshwar, Jatavallabhula, Krishna Murthy, Sridharan, Mohan, Krishna, Madhava

arXiv.org Artificial IntelligenceFeb-4-2025

Assistive agents performing household tasks such as making the bed or cooking breakfast often compute and execute actions that accomplish one task at a time. However, efficiency can be improved by anticipating upcoming tasks and computing an action sequence that jointly achieves these tasks. State-of-the-art methods for task anticipation use data-driven deep networks and Large Language Models (LLMs), but they do so at the level of high-level tasks and/or require many training examples. Our framework leverages the generic knowledge of LLMs through a small number of prompts to perform high-level task anticipation, using the anticipated tasks as goals in a classical planning system to compute a sequence of finer-granularity actions that jointly achieve these goals. We ground and evaluate our framework's abilities in realistic scenarios in the VirtualHome environment and demonstrate a 31% reduction in execution time compared with a system that does not consider upcoming tasks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.02066

Country:

Asia > India (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

PickScan: Object discovery and reconstruction from handheld interactions

van der Brugge, Vincent, Pollefeys, Marc, Tenenbaum, Joshua B., Tewari, Ayush, Jatavallabhula, Krishna Murthy

arXiv.org Artificial IntelligenceNov-17-2024

Reconstructing compositional 3D representations of scenes, where each object is represented with its own 3D model, is a highly desirable capability in robotics and augmented reality. However, most existing methods rely heavily on strong appearance priors for object discovery, therefore only working on those classes of objects on which the method has been trained, or do not allow for object manipulation, which is necessary to scan objects fully and to guide object discovery in challenging scenarios. We address these limitations with a novel interaction-guided and class-agnostic method based on object displacements that allows a user to move around a scene with an RGB-D camera, hold up objects, and finally outputs one 3D model per held-up object. Our main contribution to this end is a novel approach to detecting user-object interactions and extracting the masks of manipulated objects. On a custom-captured dataset, our pipeline discovers manipulated objects with 78.3% precision at 100% recall and reconstructs them with a mean chamfer distance of 0.90cm. Compared to Co-Fusion, the only comparable interaction-based and class-agnostic baseline, this corresponds to a reduction in chamfer distance of 73% while detecting 99% fewer false positives.

artificial intelligence, machine learning, reconstruction, (16 more...)

arXiv.org Artificial Intelligence

2411.11196

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

Gaussian Splatting Visual MPC for Granular Media Manipulation

Tseng, Wei-Cheng, Zhang, Ellina, Jatavallabhula, Krishna Murthy, Shkurti, Florian

arXiv.org Artificial IntelligenceOct-13-2024

Recent advancements in learned 3D representations have enabled significant progress in solving complex robotic manipulation tasks, particularly for rigid-body objects. However, manipulating granular materials such as beans, nuts, and rice, remains challenging due to the intricate physics of particle interactions, high-dimensional and partially observable state, inability to visually track individual particles in a pile, and the computational demands of accurate dynamics prediction. Current deep latent dynamics models often struggle to generalize in granular material manipulation due to a lack of inductive biases. In this work, we propose a novel approach that learns a visual dynamics model over Gaussian splatting representations of scenes and leverages this model for manipulating granular media via Model-Predictive Control. Our method enables efficient optimization for complex manipulation tasks on piles of granular media. We evaluate our approach in both simulated and real-world settings, demonstrating its ability to solve unseen planning tasks and generalize to new environments in a zero-shot transfer. We also show significant prediction and manipulation performance improvements compared to existing granular media manipulation methods.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2410.0974

Country: North America (0.46)

Genre: Research Report (0.85)

Industry:

Energy > Oil & Gas > Downstream (0.41)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution

Rivera, Corban, Byrd, Grayson, Paul, William, Feldman, Tyler, Booker, Meghan, Holmes, Emma, Handelman, David, Kemp, Bethany, Badger, Andrew, Schmidt, Aurora, Jatavallabhula, Krishna Murthy, de Melo, Celso M, Seenivasan, Lalithkumar, Unberath, Mathias, Chellappa, Rama

arXiv.org Artificial IntelligenceOct-8-2024

Robotic planning and execution in open-world environments is a complex problem due to the vast state spaces and high variability of task embodiment. Recent advances in perception algorithms, combined with Large Language Models (LLMs) for planning, offer promising solutions to these challenges, as the common sense reasoning capabilities of LLMs provide a strong heuristic for efficiently searching the action space. However, prior work fails to address the possibility of hallucinations from LLMs, which results in failures to execute the planned actions largely due to logical fallacies at high- or low-levels. To contend with automation failure due to such hallucinations, we introduce ConceptAgent, a natural language-driven robotic platform designed for task execution in unstructured environments. With a focus on scalability and reliability of LLM-based planning in complex state and action spaces, we present innovations designed to limit these shortcomings, including 1) Predicate Grounding to prevent and recover from infeasible actions, and 2) an embodied version of LLM-guided Monte Carlo Tree Search with self reflection. In simulation experiments, ConceptAgent achieved a 19% task completion rate across three room layouts and 30 easy level embodied tasks outperforming other state-of-the-art LLM-driven reasoning baselines that scored 10.26% and 8.11% on the same benchmark. Additionally, ablation studies on moderate to hard embodied tasks revealed a 20% increase in task completion from the baseline agent to the fully enhanced ConceptAgent, highlighting the individual and combined contributions of Predicate Grounding and LLM-guided Tree Search to enable more robust automation in complex state and action spaces.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.06108

Genre: Research Report > Promising Solution (0.34)

Industry:

Energy (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

STAMP: Differentiable Task and Motion Planning via Stein Variational Gradient Descent

Lee, Yewon, Huang, Philip, Jatavallabhula, Krishna Murthy, Li, Andrew Z., Damken, Fabian, Heiden, Eric, Smith, Kevin, Nowrouzezahrai, Derek, Ramos, Fabio, Shkurti, Florian

arXiv.org Artificial IntelligenceJan-7-2024

Planning for many manipulation tasks, such as using tools or assembling parts, often requires both symbolic and geometric reasoning. Task and Motion Planning (TAMP) algorithms typically solve these problems by conducting a tree search over high-level task sequences while checking for kinematic and dynamic feasibility. This can be inefficient as the width of the tree can grow exponentially with the number of possible actions and objects. In this paper, we propose a novel approach to TAMP that relaxes discrete-and-continuous TAMP problems into inference problems on a continuous domain. Our method, Stein Task and Motion Planning (STAMP) subsequently solves this new problem using a gradient-based variational inference algorithm called Stein Variational Gradient Descent, by obtaining gradients from a parallelized differentiable physics simulator. By introducing relaxations to the discrete variables, leveraging parallelization, and approaching TAMP as an Bayesian inference problem, our method is able to efficiently find multiple diverse plans in a single optimization run. We demonstrate our method on two TAMP problems and benchmark them against existing TAMP baselines.

artificial intelligence, machine learning, stamp, (16 more...)

arXiv.org Artificial Intelligence

2310.01775

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM

Keetha, Nikhil, Karhade, Jay, Jatavallabhula, Krishna Murthy, Yang, Gengshan, Scherer, Sebastian, Ramanan, Deva, Luiten, Jonathon

arXiv.org Artificial IntelligenceDec-4-2023

Dense simultaneous localization and mapping (SLAM) is pivotal for embodied scene understanding. Recent work has shown that 3D Gaussians enable high-quality reconstruction and real-time rendering of scenes using multiple posed cameras. In this light, we show for the first time that representing a scene by 3D Gaussians can enable dense SLAM using a single unposed monocular RGB-D camera. Our method, SplaTAM, addresses the limitations of prior radiance field-based representations, including fast rendering and optimization, the ability to determine if areas have been previously mapped, and structured map expansion by adding more Gaussians. We employ an online tracking and mapping pipeline while tailoring it to specifically use an underlying Gaussian representation and silhouette-guided optimization via differentiable rendering. Extensive experiments show that SplaTAM achieves up to 2X state-of-the-art performance in camera pose estimation, map construction, and novel-view synthesis, demonstrating its superiority over existing approaches, while allowing real-time rendering of a high-resolution dense 3D map.

artificial intelligence, gaussian, representation, (16 more...)

arXiv.org Artificial Intelligence

2312.02126

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

AnyLoc: Towards Universal Visual Place Recognition

Keetha, Nikhil, Mishra, Avneesh, Karhade, Jay, Jatavallabhula, Krishna Murthy, Scherer, Sebastian, Krishna, Madhava, Garg, Sourav

arXiv.org Artificial IntelligenceNov-28-2023

Visual Place Recognition (VPR) is vital for robot localization. To date, the most performant VPR approaches are environment- and task-specific: while they exhibit strong performance in structured environments (predominantly urban driving), their performance degrades severely in unstructured environments, rendering most approaches brittle to robust real-world deployment. In this work, we develop a universal solution to VPR -- a technique that works across a broad range of structured and unstructured environments (urban, outdoors, indoors, aerial, underwater, and subterranean environments) without any re-training or fine-tuning. We demonstrate that general-purpose feature representations derived from off-the-shelf self-supervised models with no VPR-specific training are the right substrate upon which to build such a universal VPR solution. Combining these derived features with unsupervised feature aggregation enables our suite of methods, AnyLoc, to achieve up to 4X significantly higher performance than existing approaches. We further obtain a 6% improvement in performance by characterizing the semantic properties of these features, uncovering unique domains which encapsulate datasets from similar environments. Our detailed experiments and analysis lay a foundation for building VPR solutions that may be deployed anywhere, anytime, and across anyview. We encourage the readers to explore our project page and interactive demos: https://anyloc.github.io/.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.00688

Country: North America > Canada (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Differentiable Visual Computing for Inverse Problems and Machine Learning

Spielberg, Andrew, Zhong, Fangcheng, Rematas, Konstantinos, Jatavallabhula, Krishna Murthy, Oztireli, Cengiz, Li, Tzu-Mao, Nowrouzezahrai, Derek

arXiv.org Artificial IntelligenceNov-21-2023

Originally designed for applications in computer graphics, visual computing (VC) methods synthesize information about physical and virtual worlds, using prescribed algorithms optimized for spatial computing. VC is used to analyze geometry, physically simulate solids, fluids, and other media, and render the world via optical techniques. These fine-tuned computations that operate explicitly on a given input solve so-called forward problems, VC excels at. By contrast, deep learning (DL) allows for the construction of general algorithmic models, side stepping the need for a purely first principles-based approach to problem solving. DL is powered by highly parameterized neural network architectures -- universal function approximators -- and gradient-based search algorithms which can efficiently search that large parameter space for optimal models. This approach is predicated by neural network differentiability, the requirement that analytic derivatives of a given problem's task metric can be computed with respect to neural network's parameters. Neural networks excel when an explicit model is not known, and neural network training solves an inverse problem in which a model is computed from data. While VC provides a strong inductive bias about the dynamics of realworld phenomena -- one that would otherwise have to be learned from scratch in a pure DL context -- its inability to adapt its mathematical models based on observations of real-world phenomena precludes its direct integration into larger DL-based systems.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2312.04574

Country:

North America > United States > Massachusetts (0.14)
North America > United States > Kansas (0.14)
North America > United States > California (0.14)
(2 more...)

Genre: Overview (0.68)

Industry: Energy > Oil & Gas > Upstream (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving

Choudhary, Tushar, Dewangan, Vikrant, Chandhok, Shivam, Priyadarshan, Shubham, Jain, Anushka, Singh, Arun K., Srivastava, Siddharth, Jatavallabhula, Krishna Murthy, Krishna, K. Madhava

arXiv.org Artificial IntelligenceNov-14-2023

Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representations, eliminating the need for task-specific models. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision-making based on visual cues. We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret free-form natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encompassing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.

large language model, lvlm, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2310.02251

Genre: Research Report (0.40)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

ConceptFusion: Open-set Multimodal 3D Mapping

Jatavallabhula, Krishna Murthy, Kuwajerwala, Alihusein, Gu, Qiao, Omama, Mohd, Chen, Tao, Maalouf, Alaa, Li, Shuang, Iyer, Ganesh, Saryazdi, Soroush, Keetha, Nikhil, Tewari, Ayush, Tenenbaum, Joshua B., de Melo, Celso Miguel, Krishna, Madhava, Paull, Liam, Shkurti, Florian, Torralba, Antonio

arXiv.org Artificial IntelligenceOct-23-2023

Building 3D maps of the environment is central to robot navigation, planning, and interaction with objects in a scene. Most existing approaches that integrate semantic concepts with 3D maps largely remain confined to the closed-set setting: they can only reason about a finite set of concepts, pre-defined at training time. Further, these maps can only be queried using class labels, or in recent work, using text prompts. We address both these issues with ConceptFusion, a scene representation that is (1) fundamentally open-set, enabling reasoning beyond a closed set of concepts and (ii) inherently multimodal, enabling a diverse range of possible queries to the 3D map, from language, to images, to audio, to 3D geometry, all working in concert. ConceptFusion leverages the open-set capabilities of today's foundation models pre-trained on internet-scale data to reason about concepts across modalities such as natural language, images, and audio. We demonstrate that pixel-aligned open-set features can be fused into 3D maps via traditional SLAM and multi-view fusion approaches. This enables effective zero-shot spatial reasoning, not needing any additional training or finetuning, and retains long-tailed concepts better than supervised approaches, outperforming them by more than 40% margin on 3D IoU. We extensively evaluate ConceptFusion on a number of real-world datasets, simulated home environments, a real-world tabletop manipulation task, and an autonomous driving platform. We showcase new avenues for blending foundation models with 3D open-set multimodal mapping. For more information, visit our project page https://concept-fusion.github.io or watch our 5-minute explainer video https://www.youtube.com/watch?v=rkXgws8fiDs

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2302.07241

Country:

Asia (0.46)
North America > Canada (0.46)

Genre: Research Report (1.00)

Industry:

Information Technology (0.66)
Transportation > Ground > Road (0.66)
Automobiles & Trucks (0.48)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback