AITopics | Sachdeva, Enna

Collaborating Authors

Sachdeva, Enna

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks

Kang, Haoqiang, Sachdeva, Enna, Gupta, Piyush, Bae, Sangjae, Lee, Kwonjoon

arXiv.org Artificial IntelligenceMar-9-2025

Vision-Language Models (VLMs) have recently shown promising advancements in sequential decision-making tasks through task-specific fine-tuning. However, common fine-tuning methods, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) techniques like Proximal Policy Optimization (PPO), present notable limitations: SFT assumes Independent and Identically Distributed (IID) data, while PPO focuses on maximizing cumulative rewards. These limitations often restrict solution diversity and hinder generalization in multi-step reasoning tasks. To address these challenges, we introduce a novel framework, GFlowVLM, a framework that fine-tune VLMs using Generative Flow Networks (GFlowNets) to promote generation of diverse solutions for complex reasoning tasks. GFlowVLM models the environment as a non-Markovian decision process, allowing it to capture long-term dependencies essential for real-world applications. It takes observations and task descriptions as inputs to prompt chain-of-thought (CoT) reasoning which subsequently guides action selection. We use task based rewards to fine-tune VLM with GFlowNets. This approach enables VLMs to outperform prior fine-tuning methods, including SFT and RL. Empirical results demonstrate the effectiveness of GFlowVLM on complex tasks such as card games (NumberLine, BlackJack) and embodied planning tasks (ALFWorld), showing enhanced training efficiency, solution diversity, and stronger generalization capabilities across both in-distribution and out-of-distribution scenarios.

large language model, machine learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

2503.06514

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Generalized Mission Planning for Heterogeneous Multi-Robot Teams via LLM-constructed Hierarchical Trees

Gupta, Piyush, Isele, David, Sachdeva, Enna, Huang, Pin-Hao, Dariush, Behzad, Lee, Kwonjoon, Bae, Sangjae

arXiv.org Artificial IntelligenceJan-27-2025

We present a novel mission-planning strategy for heterogeneous multi-robot teams, taking into account the specific constraints and capabilities of each robot. Our approach employs hierarchical trees to systematically break down complex missions into manageable sub-tasks. We develop specialized APIs and tools, which are utilized by Large Language Models (LLMs) to efficiently construct these hierarchical trees. Once the hierarchical tree is generated, it is further decomposed to create optimized schedules for each robot, ensuring adherence to their individual constraints and capabilities. We demonstrate the effectiveness of our framework through detailed examples covering a wide range of missions, showcasing its flexibility and scalability.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.16539

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Government > Military (0.73)
Leisure & Entertainment > Games (0.62)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Optimal Driver Warning Generation in Dynamic Driving Environment

Li, Chenran, Xu, Aolin, Sachdeva, Enna, Misu, Teruhisa, Dariush, Behzad

arXiv.org Artificial IntelligenceNov-9-2024

The driver warning system that alerts the human driver about potential risks during driving is a key feature of an advanced driver assistance system. Existing driver warning technologies, mainly the forward collision warning and unsafe lane change warning, can reduce the risk of collision caused by human errors. However, the current design methods have several major limitations. Firstly, the warnings are mainly generated in a one-shot manner without modeling the ego driver's reactions and surrounding objects, which reduces the flexibility and generality of the system over different scenarios. Additionally, the triggering conditions of warning are mostly rule-based threshold-checking given the current state, which lacks the prediction of the potential risk in a sufficiently long future horizon. In this work, we study the problem of optimally generating driver warnings by considering the interactions among the generated warning, the driver behavior, and the states of ego and surrounding vehicles on a long horizon. The warning generation problem is formulated as a partially observed Markov decision process (POMDP). An optimal warning generation framework is proposed as a solution to the proposed POMDP. The simulation experiments demonstrate the superiority of the proposed solution to the existing warning generation methods.

artificial intelligence, machine learning, vehicle, (17 more...)

arXiv.org Artificial Intelligence

2411.06306

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.50)

Industry: Automobiles & Trucks (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.91)

Add feedback

Disentangled Neural Relational Inference for Interpretable Motion Prediction

Dax, Victoria M., Li, Jiachen, Sachdeva, Enna, Agarwal, Nakul, Kochenderfer, Mykel J.

arXiv.org Artificial IntelligenceJan-7-2024

Effective interaction modeling and behavior prediction of dynamic agents play a significant role in interactive motion planning for autonomous robots. Although existing methods have improved prediction accuracy, few research efforts have been devoted to enhancing prediction model interpretability and out-of-distribution (OOD) generalizability. This work addresses these two challenging aspects by designing a variational auto-encoder framework that integrates graph-based representations and time-sequence models to efficiently capture spatio-temporal relations between interactive agents and predict their dynamics. Our model infers dynamic interaction graphs in a latent space augmented with interpretable edge features that characterize the interactions. Moreover, we aim to enhance model interpretability and performance in OOD scenarios by disentangling the latent space of edge features, thereby strengthening model versatility and robustness. We validate our approach through extensive experiments on both simulated and real-world datasets. The results show superior performance compared to existing methods in modeling spatio-temporal relations, motion prediction, and identifying time-invariant latent features.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LRA.2023.3342554

2401.03599

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Industry: Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning

Sachdeva, Enna, Agarwal, Nakul, Chundi, Suhas, Roelofs, Sean, Li, Jiachen, Kochenderfer, Mykel, Choi, Chiho, Dariush, Behzad

arXiv.org Artificial IntelligenceNov-8-2023

The widespread adoption of commercial autonomous vehicles (AVs) and advanced driver assistance systems (ADAS) may largely depend on their acceptance by society, for which their perceived trustworthiness and interpretability to riders are crucial. In general, this task is challenging because modern autonomous systems software relies heavily on black-box artificial intelligence models. Towards this goal, this paper introduces a novel dataset, Rank2Tell, a multi-modal ego-centric dataset for Ranking the importance level and Telling the reason for the importance. Using various close and open-ended visual question answering, the dataset provides dense annotations of various semantic, spatial, temporal, and relational attributes of various important objects in complex traffic scenarios. The dense annotations and unique attributes of the dataset make it a valuable resource for researchers working on visual scene understanding and related fields. Furthermore, we introduce a joint model for joint importance level ranking and natural language captions generation to benchmark our dataset and demonstrate performance with quantitative evaluations.

artificial intelligence, joint importance ranking and reasoning, natural language, (2 more...)

arXiv.org Artificial Intelligence

2309.06597

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.53)
Information Technology > Artificial Intelligence > Natural Language (0.53)

Add feedback

Domain Knowledge Driven Pseudo Labels for Interpretable Goal-Conditioned Interactive Trajectory Prediction

Sun, Lingfeng, Tang, Chen, Niu, Yaru, Sachdeva, Enna, Choi, Chiho, Misu, Teruhisa, Tomizuka, Masayoshi, Zhan, Wei

arXiv.org Artificial IntelligenceAug-12-2022

Motion forecasting in highly interactive scenarios is a challenging problem in autonomous driving. In such scenarios, we need to accurately predict the joint behavior of interacting agents to ensure the safe and efficient navigation of autonomous vehicles. Recently, goal-conditioned methods have gained increasing attention due to their advantage in performance and their ability to capture the multimodality in trajectory distribution. In this work, we study the joint trajectory prediction problem with the goal-conditioned framework. In particular, we introduce a conditional-variational-autoencoder-based (CVAE) model to explicitly encode different interaction modes into the latent space. However, we discover that the vanilla model suffers from posterior collapse and cannot induce an informative latent space as desired. To address these issues, we propose a novel approach to avoid KL vanishing and induce an interpretable interactive latent space with pseudo labels. The proposed pseudo labels allow us to incorporate domain knowledge on interaction in a flexible manner. We motivate the proposed method using an illustrative toy example. In addition, we validate our framework on the Waymo Open Motion Dataset with both quantitative and qualitative evaluations.

artificial intelligence, goal pair, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2203.15112

Country: North America > United States (0.28)

Genre: Research Report (0.84)

Industry: Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.49)

Add feedback