AITopics | Krishnamurthy, Prashanth

Collaborating Authors

Krishnamurthy, Prashanth

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Safe Multi-Robotic Arm Interaction via 3D Convex Shapes

Kaypak, Ali Umut, Wei, Shiqing, Krishnamurthy, Prashanth, Khorrami, Farshad

arXiv.org Artificial IntelligenceMar-14-2025

Inter-robot collisions pose a significant safety risk when multiple robotic arms operate in close proximity. We present an online collision avoidance methodology leveraging 3D convex shape-based High-Order Control Barrier Functions (HOCBFs) to address this issue. While prior works focused on using Control Barrier Functions (CBFs) for human-robotic arm and single-arm collision avoidance, we explore the problem of collision avoidance between multiple robotic arms operating in a shared space. In our methodology, we utilize the proposed HOCBFs as centralized and decentralized safety filters. These safety filters are compatible with any nominal controller and ensure safety without significantly restricting the robots' workspace. A key challenge in implementing these filters is the computational overhead caused by the large number of safety constraints and the computation of a Hessian matrix per constraint. We address this challenge by employing numerical differentiation methods to approximate computationally intensive terms. The effectiveness of our method is demonstrated through extensive simulation studies and real-world experiments with Franka Research 3 robotic arms.

artificial intelligence, hocbf, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2503.11791

Country:

North America > United States > New York (0.14)
Europe > Middle East > Cyprus (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

Compliant Control of Quadruped Robots for Assistive Load Carrying

Khandelwal, Nimesh, Manu, Amritanshu, Gupta, Shakti S., Kothari, Mangal, Krishnamurthy, Prashanth, Khorrami, Farshad

arXiv.org Artificial IntelligenceMar-13-2025

This paper presents a novel method for assistive load carrying using quadruped robots. The controller uses proprioceptive sensor data to estimate external base wrench, that is used for precise control of the robot's acceleration during payload transport. The acceleration is controlled using a combination of admittance control and Control Barrier Function (CBF) based quadratic program (QP). The proposed controller rejects disturbances and maintains consistent performance under varying load conditions. Additionally, the built-in CBF guarantees collision avoidance with the collaborative agent in front of the robot. The efficacy of the overall controller is shown by its implementation on the physical hardware as well as numerical simulations. The proposed control framework aims to enhance the quadruped robot's ability to perform assistive tasks in various scenarios, from industrial applications to search and rescue operations.

artificial intelligence, controller, robot, (16 more...)

arXiv.org Artificial Intelligence

2503.10401

Genre: Research Report (1.00)

Industry: Transportation (0.49)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)

Add feedback

D-CIPHER: Dynamic Collaborative Intelligent Agents with Planning and Heterogeneous Execution for Enhanced Reasoning in Offensive Security

Udeshi, Meet, Shao, Minghao, Xi, Haoran, Rani, Nanda, Milner, Kimberly, Putrevu, Venkata Sai Charan, Dolan-Gavitt, Brendan, Shukla, Sandeep Kumar, Krishnamurthy, Prashanth, Khorrami, Farshad, Karri, Ramesh, Shafique, Muhammad

arXiv.org Artificial IntelligenceFeb-15-2025

Large Language Models (LLMs) have been used in cybersecurity in many ways, including their recent use as intelligent agent systems for autonomous security analysis. Capture the Flag (CTF) challenges serve as benchmarks for assessing the automated task-planning abilities of LLM agents across various cybersecurity skill sets. Early attempts to apply LLMs for solving CTF challenges relied on single-agent systems, where feedback was restricted to a single reasoning-action loop. This approach proved inadequate for handling complex CTF tasks. Drawing inspiration from real-world CTF competitions, where teams of experts collaborate, we introduce the D-CIPHER multi-agent LLM framework for collaborative CTF challenge solving. D-CIPHER integrates agents with distinct roles, enabling dynamic feedback loops to enhance reasoning on CTF challenges. It introduces the Planner-Executor agent system, consisting of a Planner agent for overall problem-solving along with multiple heterogeneous Executor agents for individual tasks, facilitating efficient allocation of responsibilities among the LLMs. Additionally, D-CIPHER incorporates an Auto-prompter agent, which improves problem-solving by exploring the challenge environment and generating a highly relevant initial prompt. We evaluate D-CIPHER on CTF benchmarks using multiple LLM models and conduct comprehensive studies to highlight the impact of our enhancements. Our results demonstrate that the multi-agent D-CIPHER system achieves a significant improvement in challenges solved, setting a state-of-the-art performance on three benchmarks: 22.0% on NYU CTF Bench, 22.5% on Cybench, and 44.0% on HackTheBox. D-CIPHER is available at https://github.com/NYU-LLM-CTF/nyuctf_agents as the nyuctf_multiagent package.

artificial intelligence, large language model, natural language, (5 more...)

arXiv.org Artificial Intelligence

2502.10931

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics

Ghazanfari, Sara, Garg, Siddharth, Flammarion, Nicolas, Krishnamurthy, Prashanth, Khorrami, Farshad, Croce, Francesco

arXiv.org Artificial IntelligenceDec-13-2024

Human perception of similarity across uni- and multimodal inputs is highly complex, making it challenging to develop automated metrics that accurately mimic it. General purpose vision-language models, such as CLIP and large multi-modal models (LMMs), can be applied as zero-shot perceptual metrics, and several recent works have developed models specialized in narrow perceptual tasks. However, the extent to which existing perceptual metrics align with human perception remains unclear. To investigate this question, we introduce UniSim-Bench, a benchmark encompassing 7 multi-modal perceptual similarity tasks, with a total of 25 datasets. Our evaluation reveals that while general-purpose models perform reasonably well on average, they often lag behind specialized models on individual tasks. Conversely, metrics fine-tuned for specific tasks fail to generalize well to unseen, though related, tasks. As a first step towards a unified multi-task perceptual similarity metric, we fine-tune both encoder-based and generative vision-language models on a subset of the UniSim-Bench tasks. This approach yields the highest average performance, and in some cases, even surpasses taskspecific models. Nevertheless, these models still struggle with generalization to unseen tasks, highlighting the ongoing challenge of learning a robust, unified perceptual similarity metric capable of capturing the human notion of similarity. The code and models are available at https://github.com/SaraGhazanfari/UniSim.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.10594

Genre: Research Report (0.81)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Distributed Inverse Dynamics Control for Quadruped Robots using Geometric Optimization

Khandelwal, Nimesh, Manu, Amritanshu, Gupta, Shakti S., Kothari, Mangal, Krishnamurthy, Prashanth, Khorrami, Farshad

arXiv.org Artificial IntelligenceDec-12-2024

This paper presents a distributed inverse dynamics controller (DIDC) for quadruped robots that addresses the limitations of existing reactive controllers: simplified dynamical models, the inability to handle exact friction cone constraints, and the high computational requirements of whole-body controllers. Current methods either ignore friction constraints entirely or use linear approximations, leading to potential slip and instability, while comprehensive whole-body controllers demand significant computational resources. Our approach uses full rigid-body dynamics and enforces exact friction cone constraints through a novel geometric optimization-based solver. DIDC combines the required generalized forces corresponding to the actuated and unactuated spaces by projecting them onto the actuated space while satisfying the physical constraints and maintaining orthogonality between the base and joint tracking objectives. Experimental validation shows that our approach reduces foot slippage, improves orientation tracking, and converges at least two times faster than existing reactive controllers with generic QP-based implementations. The controller enables stable omnidirectional trotting at various speeds and consumes less power than comparable methods while running efficiently on embedded processors.

artificial intelligence, constraint, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2412.09816

Country: North America > United States > New York (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.85)

Add feedback

Out-of-Distribution Detection with Overlap Index

Fu, Hao, Krishnamurthy, Prashanth, Garg, Siddharth, Khorrami, Farshad

arXiv.org Machine LearningDec-8-2024

Out-of-distribution (OOD) detection is crucial for the deployment of machine learning models in the open world. While existing OOD detectors are effective in identifying OOD samples that deviate significantly from in-distribution (ID) data, they often come with trade-offs. For instance, deep OOD detectors usually suffer from high computational costs, require tuning hyperparameters, and have limited interpretability, whereas traditional OOD detectors may have a low accuracy on large high-dimensional datasets. To address these limitations, we propose a novel effective OOD detection approach that employs an overlap index (OI)-based confidence score function to evaluate the likelihood of a given input belonging to the same distribution as the available ID samples. The proposed OI-based confidence score function is non-parametric, lightweight, and easy to interpret, hence providing strong flexibility and generality. Extensive empirical evaluations indicate that our OI-based OOD detector is competitive with state-of-the-art OOD detectors in terms of detection accuracy on a wide range of datasets while requiring less computation and memory costs. Lastly, we show that the proposed OI-based confidence score function inherits nice properties from OI (e.g., insensitivity to small distributional variations and robustness against Huber $\epsilon$-contamination) and is a versatile tool for estimating OI and model accuracy in specific contexts.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2412.06168

Country: North America > United States > New York (0.14)

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training

Goswami, Raktim Gautam, Krishnamurthy, Prashanth, LeCun, Yann, Khorrami, Farshad

arXiv.org Artificial IntelligenceNov-26-2024

Vision-based pose estimation of articulated robots with unknown joint angles has applications in collaborative robotics and human-robot interaction tasks. Current frameworks use neural network encoders to extract image features and downstream layers to predict joint angles and robot pose. While images of robots inherently contain rich information about the robot's physical structures, existing methods often fail to leverage it fully; therefore, limiting performance under occlusions and truncations. To address this, we introduce RoboPEPP, a method that fuses information about the robot's physical model into the encoder using a masking-based self-supervised embedding-predictive architecture. Specifically, we mask the robot's joints and pre-train an encoder-predictor model to infer the joints' embeddings from surrounding unmasked regions, enhancing the encoder's understanding of the robot's physical model. The pre-trained encoder-predictor pair, along with joint angle and keypoint prediction networks, is then fine-tuned for pose and joint angle estimation. Random masking of input during fine-tuning and keypoint filtering during evaluation further improves robustness. Our method, evaluated on several datasets, achieves the best results in robot pose and joint angle estimation while being the least sensitive to occlusions and requiring the lowest execution time.

artificial intelligence, machine learning, robot, (18 more...)

arXiv.org Artificial Intelligence

2411.17662

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

OrionNav: Online Planning for Robot Autonomy with Context-Aware LLM and Open-Vocabulary Semantic Scene Graphs

Devarakonda, Venkata Naren, Goswami, Raktim Gautam, Kaypak, Ali Umut, Patel, Naman, Khorrambakht, Rooholla, Krishnamurthy, Prashanth, Khorrami, Farshad

arXiv.org Artificial IntelligenceOct-22-2024

Enabling robots to autonomously navigate unknown, complex, dynamic environments and perform diverse tasks remains a fundamental challenge in developing robust autonomous physical agents. These agents must effectively perceive their surroundings while leveraging world knowledge for decision-making. Although recent approaches utilize vision-language and large language models for scene understanding and planning, they often rely on offline processing, offboard compute, make simplifying assumptions about the environment and perception, limiting real-world applicability. We present a novel framework for real-time onboard autonomous navigation in unknown environments that change over time by integrating multi-level abstraction in both perception and planning pipelines. Our system fuses data from multiple onboard sensors for localization and mapping and integrates it with open-vocabulary semantics to generate hierarchical scene graphs from continuously updated semantic object map. The LLM-based planner uses these graphs to create multi-step plans that guide low-level controllers in executing navigation tasks specified in natural language. The system's real-time operation enables the LLM to adjust its plans based on updates to the scene graph and task execution status, ensuring continuous adaptation to new situations or when the current plan cannot accomplish the task, a key advantage over static or rule-based systems. We demonstrate our system's efficacy on a quadruped navigating dynamic environments, showcasing its adaptability and robustness in diverse scenarios.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.06239

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Greater London > London (0.14)
Asia > Japan > Honshū (0.14)
(4 more...)

Genre:

Research Report (0.63)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

EMMA: Efficient Visual Alignment in Multi-Modal LLMs

Ghazanfari, Sara, Araujo, Alexandre, Krishnamurthy, Prashanth, Garg, Siddharth, Khorrami, Farshad

arXiv.org Artificial IntelligenceOct-2-2024

Multi-modal Large Language Models (MLLMs) have recently exhibited impressive general-purpose capabilities by leveraging vision foundation models to encode the core concepts of images into representations. These are then combined with instructions and processed by the language model to generate high-quality responses. Despite significant progress in enhancing the language component, challenges persist in optimally fusing visual encodings within the language model for task-specific adaptability. Recent research has focused on improving this fusion through modality adaptation modules but at the cost of significantly increased model complexity and training data needs. In this paper, we propose EMMA (Efficient Multi-Modal Adaptation), a lightweight cross-modality module designed to efficiently fuse visual and textual encodings, generating instruction-aware visual representations for the language model. Our key contributions include: (1) an efficient early fusion mechanism that integrates vision and language representations with minimal added parameters (less than 0.2% increase in model size), (2) an in-depth interpretability analysis that sheds light on the internal mechanisms of the proposed method; (3) comprehensive experiments that demonstrate notable improvements on both specialized and general benchmarks for MLLMs. Empirical results show that EMMA boosts performance across multiple tasks by up to 9.3% while significantly improving robustness against hallucinations. Our code is available at https://github.com/SaraGhazanfari/EMMA

arxiv preprint arxiv, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2410.0208

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MultiTalk: Introspective and Extrospective Dialogue for Human-Environment-LLM Alignment

Devarakonda, Venkata Naren, Kaypak, Ali Umut, Yuan, Shuaihang, Krishnamurthy, Prashanth, Fang, Yi, Khorrami, Farshad

arXiv.org Artificial IntelligenceSep-24-2024

LLMs have shown promising results in task planning due to their strong natural language understanding and reasoning capabilities. However, issues such as hallucinations, ambiguities in human instructions, environmental constraints, and limitations in the executing agent's capabilities often lead to flawed or incomplete plans. This paper proposes MultiTalk, an LLM-based task planning methodology that addresses these issues through a framework of introspective and extrospective dialogue loops. This approach helps ground generated plans in the context of the environment and the agent's capabilities, while also resolving uncertainties and ambiguities in the given task. These loops are enabled by specialized systems designed to extract and predict task-specific states, and flag mismatches or misalignments among the human user, the LLM agent, and the environment. Effective feedback pathways between these systems and the LLM planner foster meaningful dialogue. The efficacy of this methodology is demonstrated through its application to robotic manipulation tasks. Experiments and ablations highlight the robustness and reliability of our method, and comparisons with baselines further illustrate the superiority of MultiTalk in task planning for embodied agents.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2409.16455

Country: North America > United States > New York (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback