AITopics | Tokekar, Pratap

Collaborating Authors

Tokekar, Pratap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Boosting Sample Efficiency and Generalization in Multi-agent Reinforcement Learning via Equivariance

McClellan, Joshua, Haghani, Naveed, Winder, John, Huang, Furong, Tokekar, Pratap

arXiv.org Artificial IntelligenceOct-22-2024

Multi-Agent Reinforcement Learning (MARL) struggles with sample inefficiency and poor generalization [1]. These challenges are partially due to a lack of structure or inductive bias in the neural networks typically used in learning the policy. One such form of structure that is commonly observed in multi-agent scenarios is symmetry. The field of Geometric Deep Learning has developed Equivariant Graph Neural Networks (EGNN) that are equivariant (or symmetric) to rotations, translations, and reflections of nodes. Incorporating equivariance has been shown to improve learning efficiency and decrease error [ 2 ]. In this paper, we demonstrate that EGNNs improve the sample efficiency and generalization in MARL. However, we also show that a naive application of EGNNs to MARL results in poor early exploration due to a bias in the EGNN structure. To mitigate this bias, we present Exploration-enhanced Equivariant Graph Neural Networks or E2GN2. We compare E2GN2 to other common function approximators using common MARL benchmarks MPE and SMACv2. E2GN2 demonstrates a significant improvement in sample efficiency, greater final reward convergence, and a 2x-5x gain in over standard GNNs in our generalization tests. These results pave the way for more reliable and effective solutions in complex multi-agent systems.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2410.02581

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.46)
Energy > Power Industry (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

On the Sample Complexity of a Policy Gradient Algorithm with Occupancy Approximation for General Utility Reinforcement Learning

Barakat, Anas, Chakraborty, Souradip, Yu, Peihong, Tokekar, Pratap, Bedi, Amrit Singh

arXiv.org Artificial IntelligenceOct-5-2024

Reinforcement learning with general utilities has recently gained attention thanks to its ability to unify several problems, including imitation learning, pure exploration, and safe RL. However, prior work for solving this general problem in a unified way has mainly focused on the tabular setting. This is restrictive when considering larger state-action spaces because of the need to estimate occupancy measures during policy optimization. In this work, we address this issue and propose to approximate occupancy measures within a function approximation class using maximum likelihood estimation (MLE). We propose a simple policy gradient algorithm (PG-OMA) where an actor updates the policy parameters to maximize the general utility objective whereas a critic approximates the occupancy measure using MLE. We provide a sample complexity analysis of PG-OMA showing that our occupancy measure estimation error only scales with the dimension of our function approximation class rather than the size of the state action space. Under suitable assumptions, we establish first order stationarity and global optimality performance bounds for the proposed PG-OMA algorithm for nonconcave and concave general utilities respectively. We complement our methodological and theoretical findings with promising empirical results showing the scalability potential of our approach compared to existing tabular count-based approaches.

machine learning, occupancy measure, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2410.04108

Country:

North America > United States > Maryland (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

GATSBI: An Online GTSP-Based Algorithm for Targeted Surface Bridge Inspection and Defect Detection

Dhami, Harnaik, Reddy, Charith, Sharma, Vishnu Dutt, Williams, Troi, Tokekar, Pratap

arXiv.org Artificial IntelligenceJun-24-2024

We study the problem of visual surface inspection of infrastructure for defects using an Unmanned Aerial Vehicle (UAV). We do not assume that the geometric model of the infrastructure is known beforehand. Our planner, termed GATSBI, plans a path in a receding horizon fashion to inspect all points on the surface of the infrastructure. The input to GATSBI consists of a 3D occupancy map created online with 3D pointclouds. Occupied voxels corresponding to the infrastructure in this map are semantically segmented and used to create an infrastructure-only occupancy map. Inspecting an infrastructure voxel requires the UAV to take images from a desired viewing angle and distance. We then create a Generalized Traveling Salesperson Problem (GTSP) instance to cluster candidate viewpoints for inspecting the infrastructure voxels and use an off-the-shelf GTSP solver to find the optimal path for the given instance. As the algorithm sees more parts of the environment over time, it replans the path to inspect uninspected parts of the infrastructure while avoiding obstacles. We evaluate the performance of our algorithm through high-fidelity simulations conducted in AirSim and real-world experiments. We compare the performance of GATSBI with a baseline inspection algorithm where the map is known a priori. Our evaluation reveals that targeting the inspection to only the segmented infrastructure voxels and planning carefully using a GTSP solver leads to a more efficient and thorough inspection than the baseline inspection algorithm.

artificial intelligence, infrastructure, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2406.16625

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology > Robotics & Automation (0.34)
Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.66)

Add feedback

Fast k-connectivity Restoration in Multi-Robot Systems for Robust Communication Maintenance

Ishat-E-Rabban, Md, Shi, Guangyao, Bonner, Griffin, Tokekar, Pratap

arXiv.org Artificial IntelligenceApr-4-2024

Maintaining a robust communication network plays an important role in the success of a multi-robot team jointly performing an optimization task. A key characteristic of a robust cooperative multirobot system is the ability to repair the communication topology in the case of robot failure. In this paper, we focus on the Fast k-connectivity Restoration (FCR) problem, which aims to repair a network to make it k-connected with minimum robot movement. Here, a k-connected network refers to a communication topology that cannot be disconnected by removing k 1 nodes. We develop a Quadratically Constrained Program (QCP) formulation of the FCR problem, which provides a way to optimally solve the problem, but cannot handle large instances due to high computational overhead. We therefore present a scalable algorithm, called EA-SCR, for the FCR problem using graph theoretic concepts. By conducting empirical studies, we demonstrate that the EA-SCR algorithm performs within 10% of the optimal while being orders of magnitude faster. We also show that EA-SCR outperforms existing solutions by 30% in terms of the FCR distance metric.

algorithm, artificial intelligence, robot, (14 more...)

arXiv.org Artificial Intelligence

2404.03834

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.49)
Information Technology > Artificial Intelligence > Robots > Robots in the Workplace (0.41)

Add feedback

LAVA: Long-horizon Visual Action based Food Acquisition

Bhaskar, Amisha, Liu, Rui, Sharma, Vishnu D., Shi, Guangyao, Tokekar, Pratap

arXiv.org Artificial IntelligenceMar-19-2024

Robotic Assisted Feeding (RAF) addresses the fundamental need for individuals with mobility impairments to regain autonomy in feeding themselves. The goal of RAF is to use a robot arm to acquire and transfer food to individuals from the table. Existing RAF methods primarily focus on solid foods, leaving a gap in manipulation strategies for semi-solid and deformable foods. This study introduces Long-horizon Visual Action (LAVA) based food acquisition of liquid, semisolid, and deformable foods. Long-horizon refers to the goal of "clearing the bowl" by sequentially acquiring the food from the bowl. LAVA employs a hierarchical policy for long-horizon food acquisition tasks. The framework uses high-level policy to determine primitives by leveraging ScoopNet. At the mid-level, LAVA finds parameters for primitives using vision. To carry out sequential plans in the real world, LAVA delegates action execution which is driven by Low-level policy that uses parameters received from mid-level policy and behavior cloning ensuring precise trajectory execution. We validate our approach on complex real-world acquisition trials involving granular, liquid, semisolid, and deformable food types along with fruit chunks and soup acquisition. Across 46 bowls, LAVA acquires much more efficiently than baselines with a success rate of 89 +/- 4% and generalizes across realistic plate variations such as different positions, varieties, and amount of food in the bowl. Code, datasets, videos, and supplementary materials can be found on our website.

acquisition, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2403.12876

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.67)

Add feedback

Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types

Liu, Rui, Bhaskar, Amisha, Tokekar, Pratap

arXiv.org Artificial IntelligenceMar-19-2024

In this study, we introduce a novel visual imitation network with a spatial attention module for robotic assisted feeding (RAF). The goal is to acquire (i.e., scoop) food items from a bowl. However, achieving robust and adaptive food manipulation is particularly challenging. To deal with this, we propose a framework that integrates visual perception with imitation learning to enable the robot to handle diverse scenarios during scooping. Our approach, named AVIL (adaptive visual imitation learning), exhibits adaptability and robustness across different bowl configurations in terms of material, size, and position, as well as diverse food types including granular, semi-solid, and liquid, even in the presence of distractors. We validate the effectiveness of our approach by conducting experiments on a real robot. We also compare its performance with a baseline. The results demonstrate improvement over the baseline across all scenarios, with an enhancement of up to 2.5 times in terms of a success metric. Notably, our model, trained solely on data from a transparent glass bowl containing granular cereals, showcases generalization ability when tested zero-shot on other bowl configurations with different types of food.

artificial intelligence, bowl configuration, food type, (14 more...)

arXiv.org Artificial Intelligence

2403.12891

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report > New Finding (0.55)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis

Liu, Rui, Noorani, Erfaun, Tokekar, Pratap, Baras, John S.

arXiv.org Artificial IntelligenceMar-13-2024

Reinforcement Learning (RL) has shown exceptional performance across various applications, enabling autonomous agents to learn optimal policies through interaction with their environments. However, traditional RL frameworks often face challenges in terms of iteration complexity and robustness. Risk-sensitive RL, which balances expected return and risk, has been explored for its potential to yield probabilistically robust policies, yet its iteration complexity analysis remains underexplored. In this study, we conduct a thorough iteration complexity analysis for the risk-sensitive policy gradient method, focusing on the REINFORCE algorithm and employing the exponential utility function. We obtain an iteration complexity of $\mathcal{O}(\epsilon^{-2})$ to reach an $\epsilon$-approximate first-order stationary point (FOSP). We investigate whether risk-sensitive algorithms can achieve better iteration complexity compared to their risk-neutral counterparts. Our theoretical analysis demonstrates that risk-sensitive REINFORCE can have a reduced number of iterations required for convergence. This leads to improved iteration complexity, as employing the exponential utility does not entail additional computation per iteration. We characterize the conditions under which risk-sensitive algorithms can achieve better iteration complexity. Our simulation results also validate that risk-averse cases can converge and stabilize more quickly after approximately half of the episodes compared to their risk-neutral counterparts.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2403.08955

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

Yu, Peihong, Mishra, Manav, Koppel, Alec, Busart, Carl, Narayan, Priya, Manocha, Dinesh, Bedi, Amrit, Tokekar, Pratap

arXiv.org Artificial IntelligenceMar-13-2024

Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space. While demonstration-guided learning has proven beneficial in single-agent settings, its direct applicability to MARL is hindered by the practical difficulty of obtaining joint expert demonstrations. In this work, we introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team. These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements, thus naively imitating them will not achieve cooperation due to potential conflicts. To this end, we propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate, namely personalized expert-guided MARL (PegMARL). This algorithm utilizes two discriminators: the first provides incentives based on the alignment of policy behavior with demonstrations, and the second regulates incentives based on whether the behavior leads to the desired objective. We evaluate PegMARL using personalized demonstrations in both discrete and continuous environments. The results demonstrate that PegMARL learns near-optimal policies even when provided with suboptimal demonstrations, and outperforms state-of-the-art MARL algorithms in solving coordinated tasks. We also showcase PegMARL's capability to leverage joint demonstrations in the StarCraft scenario and converge effectively even with demonstrations from non-co-trained policies.

demonstration, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2403.08936

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

REBEL: A Regularization-Based Solution for Reward Overoptimization in Reinforcement Learning from Human Feedback

Chakraborty, Souradip, Bhaskar, Amisha, Singh, Anukriti, Tokekar, Pratap, Manocha, Dinesh, Bedi, Amrit Singh

arXiv.org Artificial IntelligenceDec-21-2023

In this work, we propose REBEL, an algorithm for sample efficient reward regularization based robotic reinforcement learning from human feedback (RRLHF). Reinforcement learning (RL) performance for continuous control robotics tasks is sensitive to the underlying reward function. In practice, the reward function often ends up misaligned with human intent, values, social norms, etc., leading to catastrophic failures in the real world. We leverage human preferences to learn regularized reward functions and eventually align the agents with the true intended behavior. We introduce a novel notion of reward regularization to the existing RRLHF framework, which is termed as agent preferences. So, we not only consider human feedback in terms of preferences, we also propose to take into account the preference of the underlying RL agent while learning the reward function. We show that this helps to improve the over-optimization associated with the design of reward functions in RL. We experimentally show that REBEL exhibits up to 70% improvement in sample efficiency to achieve a similar level of episodic reward returns as compared to the state-of-the-art methods such as PEBBLE and PEBBLE+SURF.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2312.14436

Country:

North America > United States > Maryland (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Enhancing Multi-Agent Coordination through Common Operating Picture Integration

Yu, Peihong, Lee, Bhoram, Raghavan, Aswin, Samarasekara, Supun, Tokekar, Pratap, Hare, James Zachary

arXiv.org Artificial IntelligenceNov-8-2023

In multi-agent systems, agents possess only local observations of the environment. Communication between teammates becomes crucial for enhancing coordination. Past research has primarily focused on encoding local information into embedding messages which are unintelligible to humans. We find that using these messages in agent's policy learning leads to brittle policies when tested on out-of-distribution initial states. We present an approach to multi-agent coordination, where each agent is equipped with the capability to integrate its (history of) observations, actions and messages received into a Common Operating Picture (COP) and disseminate the COP. This process takes into account the dynamic nature of the environment and the shared mission. We conducted experiments in the StarCraft2 environment to validate our approach. Our results demonstrate the efficacy of COP integration, and show that COP-based training leads to robust policies compared to state-of-the-art Multi-Agent Reinforcement Learning (MARL) methods when faced with out-of-distribution initial states.

artificial intelligence, enhancing multi-agent coordination, machine learning, (1 more...)

arXiv.org Artificial Intelligence

2311.0474

Genre: Research Report > New Finding (0.53)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback