AITopics | Durugkar, Ishan

Collaborating Authors

Durugkar, Ishan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

N-Agent Ad Hoc Teamwork

Wang, Caroline, Rahman, Arrasy, Durugkar, Ishan, Liebman, Elad, Stone, Peter

arXiv.org Artificial IntelligenceApr-16-2024

Current approaches to learning cooperative behaviors in multi-agent settings assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls \textit{all} agents in the scenario, while in ad hoc teamwork, the learning algorithm usually assumes control over only a $\textit{single}$ agent in the scenario. However, many cooperative settings in the real world are much less restrictive. For example, in an autonomous driving scenario, a company might train its cars with the same learning algorithm, yet once on the road, these cars must cooperate with cars from another company. Towards generalizing the class of scenarios that cooperative learning methods can address, we introduce $N$-agent ad hoc teamwork, in which a set of autonomous agents must interact and cooperate with dynamically varying numbers and types of teammates at evaluation time. This paper formalizes the problem, and proposes the $\textit{Policy Optimization with Agent Modelling}$ (POAM) algorithm. POAM is a policy gradient, multi-agent reinforcement learning approach to the NAHT problem, that enables adaptation to diverse teammate behaviors by learning representations of teammate behaviors. Empirical evaluation on StarCraft II tasks shows that POAM improves cooperative task returns compared to baseline approaches, and enables out-of-distribution generalization to unseen teammates.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2404.1074

Country: North America > United States > Texas (0.14)

Genre:

Research Report (0.82)
Overview (0.67)

Industry:

Information Technology (0.48)
Leisure & Entertainment > Games (0.34)
Transportation > Ground (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

$f$-Policy Gradients: A General Framework for Goal Conditioned RL using $f$-Divergences

Agarwal, Siddhant, Durugkar, Ishan, Stone, Peter, Zhang, Amy

arXiv.org Artificial IntelligenceOct-10-2023

Goal-Conditioned Reinforcement Learning (RL) problems often have access to sparse rewards where the agent receives a reward signal only when it has achieved the goal, making policy optimization a difficult problem. Several works augment this sparse reward with a learned dense reward function, but this can lead to sub-optimal policies if the reward is misaligned. Moreover, recent works have demonstrated that effective shaping rewards for a particular problem can depend on the underlying learning algorithm. This paper introduces a novel way to encourage exploration called $f$-Policy Gradients, or $f$-PG. $f$-PG minimizes the f-divergence between the agent's state visitation distribution and the goal, which we show can lead to an optimal policy. We derive gradients for various f-divergences to optimize this objective. Our learning paradigm provides dense learning signals for exploration in sparse reward settings. We further introduce an entropy-regularized policy optimization objective, that we call $state$-MaxEnt RL (or $s$-MaxEnt RL) as a special case of our objective. We show that several metric-based shaping rewards like L2 can be used with $s$-MaxEnt RL, providing a common ground to study such metric-based shaping rewards with efficient exploration. We find that $f$-PG has better performance compared to standard policy gradient methods on a challenging gridworld as well as the Point Maze and FetchReach environments. More information on our website https://agarwalsiddhant10.github.io/projects/fpg.html.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2310.06794

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

DM$^2$: Decentralized Multi-Agent Reinforcement Learning for Distribution Matching

Wang, Caroline, Durugkar, Ishan, Liebman, Elad, Stone, Peter

arXiv.org Artificial IntelligenceMar-12-2023

Current approaches to multi-agent cooperation rely heavily on centralized mechanisms or explicit communication protocols to ensure convergence. This paper studies the problem of distributed multi-agent learning without resorting to centralized components or explicit communication. It examines the use of distribution matching to facilitate the coordination of independent agents. In the proposed scheme, each agent independently minimizes the distribution mismatch to the corresponding component of a target visitation distribution. The theoretical analysis shows that under certain conditions, each agent minimizing its individual distribution mismatch allows the convergence to the joint policy that generated the target distribution. Further, if the target distribution is from a joint policy that optimizes a cooperative task, the optimal policy for a combination of this task reward and the distribution matching reward is the same joint policy. This insight is used to formulate a practical algorithm (DM$^2$), in which each individual agent matches a target distribution derived from concurrently sampled trajectories from a joint expert policy. Experimental validation on the StarCraft domain shows that combining (1) a task reward, and (2) a distribution matching reward for expert demonstrations for the same task, allows agents to outperform a naive distributed baseline. Additional experiments probe the conditions under which expert demonstrations need to be sampled to obtain the learning benefits.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2206.00233

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Wasserstein Distance Maximizing Intrinsic Control

Durugkar, Ishan, Hansen, Steven, Spencer, Stephen, Mnih, Volodymyr

arXiv.org Artificial IntelligenceOct-28-2021

Mutual information based objectives have shown some success in learning skills that reach a diverse set of states in this setting. These objectives include a KL-divergence term, which is maximized by visiting distinct states even if those states are not far apart in the MDP. This paper presents an approach that rewards the agent for learning skills that maximize the Wasserstein distance of their state visitation from the start state of the skill. It shows that such an objective leads to a policy that covers more distance in the MDP than diversity based objectives, and validates the results on a variety of Atari environments.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

2110.15331

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.51)

Industry: Education > Focused Education (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Reducing Sampling Error in Batch Temporal Difference Learning

Pavse, Brahma, Durugkar, Ishan, Hanna, Josiah, Stone, Peter

arXiv.org Artificial IntelligenceAug-15-2020

Temporal difference (TD) learning is one of the main foundations of modern reinforcement learning. This paper studies the use of TD(0), a canonical TD algorithm, to estimate the value function of a given policy from a batch of data. In this batch setting, we show that TD(0) may converge to an inaccurate value function because the update following an action is weighted according to the number of times that action occurred in the batch -- not the true probability of the action under the given policy. To address this limitation, we introduce \textit{policy sampling error corrected}-TD(0) (PSEC-TD(0)). PSEC-TD(0) first estimates the empirical distribution of actions in each state in the batch and then uses importance sampling to correct for the mismatch between the empirical weighting and the correct weighting for updates following each action. We refine the concept of a certainty-equivalence estimate and argue that PSEC-TD(0) is a more data efficient estimator than TD(0) for a fixed batch of data. Finally, we conduct an empirical evaluation of PSEC-TD(0) on three batch value function learning tasks, with a hyperparameter sensitivity analysis, and show that PSEC-TD(0) produces value function estimates with lower mean squared error than TD(0).

artificial intelligence, reinforcement learning, value function, (18 more...)

arXiv.org Artificial Intelligence

2008.06738

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Imitation from Observation Approach to Sim-to-Real Transfer

Desai, Siddarth, Durugkar, Ishan, Karnan, Haresh, Warnell, Garrett, Hanna, Josiah, Stone, Peter

arXiv.org Artificial IntelligenceAug-4-2020

The sim to real transfer problem deals with leveraging large amounts of inexpensive simulation experience to help artificial agents learn behaviors intended for the real world more efficiently. One approach to sim-to-real transfer is using interactions with the real world to make the simulator more realistic, called grounded sim to-real transfer. In this paper, we show that a particular grounded sim-to-real approach, grounded action transformation, is closely related to the problem of imitation from observation IfO, learning behaviors that mimic the observations of behavior demonstrations. After establishing this relationship, we hypothesize that recent state-of-the-art approaches from the IfO literature can be effectively repurposed for such grounded sim-to-real transfer. To validate our hypothesis we derive a new sim-to-real transfer algorithm - generative adversarial reinforced action transformation (GARAT) - based on adversarial imitation from observation techniques. We run experiments in several simulation domains with mismatched dynamics, and find that agents trained with GARAT achieve higher returns in the real world compared to existing black-box sim-to-real methods

artificial intelligence, simulator, survey article, (17 more...)

arXiv.org Artificial Intelligence

2008.01594

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Multi-Preference Actor Critic

Durugkar, Ishan, Hausknecht, Matthew, Swaminathan, Adith, MacAlpine, Patrick

arXiv.org Artificial IntelligenceApr-5-2019

Policy gradient algorithms typically combine discounted future rewards with an estimated value function, to compute the direction and magnitude of parameter updates. However, for most Reinforcement Learning tasks, humans can provide additional insight to constrain the policy learning. We introduce a general method to incorporate multiple different feedback channels into a single policy gradient loss. In our formulation, the Multi-Preference Actor Critic (M-PAC), these different types of feedback are implemented as constraints on the policy. We use a Lagrangian relaxation to satisfy these constraints using gradient descent while learning a policy that maximizes rewards. Experiments in Atari and Pendulum verify that constraints are being respected and can accelerate the learning process.

artificial intelligence, demonstration, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1904.03295

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (0.48)
Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning

Das, Rajarshi, Dhuliawala, Shehzaad, Zaheer, Manzil, Vilnis, Luke, Durugkar, Ishan, Krishnamurthy, Akshay, Smola, Alex, McCallum, Andrew

arXiv.org Artificial IntelligenceDec-30-2018

Knowledge bases (KB), both automatically and manually constructed, are often incomplete --- many valid facts can be inferred from the KB by synthesizing existing information. A popular approach to KB completion is to infer new relations by combinatory reasoning over the information found along other paths connecting a pair of entities. Given the enormous size of KBs and the exponential number of paths, previous path-based models have considered only the problem of predicting a missing relation given two entities or evaluating the truth of a proposed triple. Additionally, these methods have traditionally used random paths between fixed entity pairs or more recently learned to pick paths between them. We propose a new algorithm MINERVA, which addresses the much more difficult and practical task of answering questions where the relation is known, but only one entity. Since random walks are impractical in a setting with combinatorially many destinations from a start node, we present a neural reinforcement learning approach which learns how to navigate the graph conditioned on the input query to find predictive paths. Empirically, this approach obtains state-of-the-art results on several datasets, significantly outperforming prior methods.

deep learning, minerva, neural network, (22 more...)

arXiv.org Artificial Intelligence

1711.05851

Country:

North America > United States > Texas (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Industry:

Media (0.68)
Government > Regional Government > North America Government > United States Government (0.46)
Leisure & Entertainment > Sports > Basketball (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Adversarial Goal Generation for Intrinsic Motivation

Durugkar, Ishan (University of Texas at Austin) | Stone, Peter (University of Texas at Austin)

AAAI ConferencesFeb-8-2018

Generally in Reinforcement Learning the goal, or reward signal, is given by the environment and cannot be controlled by the agent. We propose to introduce an intrinsic motivation module that will select a reward function for the agent to learn to achieve. We will use a Universal Value Function Approximator, that takes as input both the state and the parameters of this reward function as the goal to predict the value function (or action-value function) to generalize across these goals. This module will be trained to generate goals such that the agent's learning is maximized. Thus, this is also a method for automatic curriculum learning.

agent, artificial intelligence, reinforcement learning, (14 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas > Travis County > Austin (0.15)

Industry: Leisure & Entertainment > Sports (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Inverting Variational Autoencoders for Improved Generative Accuracy

Gemp, Ian, Durugkar, Ishan, Parente, Mario, Dyar, M. Darby, Mahadevan, Sridhar

arXiv.org Machine LearningAug-24-2017

Recent advances in semi-supervised learning with deep generative models have shown promise in generalizing from small labeled datasets ($\mathbf{x},\mathbf{y}$) to large unlabeled ones ($\mathbf{x}$). In the case where the codomain has known structure, a large unfeatured dataset ($\mathbf{y}$) is potentially available. We develop a parameter-efficient, deep semi-supervised generative model for the purpose of exploiting this untapped data source. Empirical results show improved performance in disentangling latent variable semantics as well as improved discriminative prediction on Martian spectroscopic and handwritten digit domains.

deep learning, neural network, spectra, (16 more...)

arXiv.org Machine Learning

1608.05983

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback