AITopics | Doshi, Prashant

A Hierarchical Bayesian model for Inverse RL in Partially-Controlled Environments

arXiv.org Artificial IntelligenceJul-12-2021

Robots learning from observations in the real world using inverse reinforcement learning (IRL) may encounter objects or agents in the environment, other than the expert, that cause nuisance observations during the demonstration. These confounding elements are typically removed in fully-controlled environments such as virtual simulations or lab settings. When complete removal is impossible the nuisance observations must be filtered out. However, identifying the source of observations when large amounts of observations are made is difficult. To address this, we present a hierarchical Bayesian model that incorporates both the expert's and the confounding elements' observations thereby explicitly modeling the diverse observations a robot may receive. We extend an existing IRL algorithm originally designed to work under partial occlusion of the expert to consider the diverse observations. In a simulated robotic sorting domain containing both occlusion and confounding elements, we demonstrate the model's effectiveness. In particular, our technique outperforms several other comparative methods, second only to having perfect knowledge of the subject's trajectory.

artificial intelligence, machine learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2107.05818

Country: North America > United States > North Carolina (0.28)

Genre: Research Report (0.82)

Add feedback

Many Agent Reinforcement Learning Under Partial Observability

He, Keyang, Doshi, Prashant, Banerjee, Bikramjit

arXiv.org Artificial IntelligenceJun-17-2021

Recent renewed interest in multi-agent reinforcement learning (MARL) has generated an impressive array of techniques that leverage deep reinforcement learning, primarily actor-critic architectures, and can be applied to a limited range of settings in terms of observability and communication. However, a continuing limitation of much of this work is the curse of dimensionality when it comes to representations based on joint actions, which grow exponentially with the number of agents. In this paper, we squarely focus on this challenge of scalability. We apply the key insight of action anonymity, which leads to permutation invariance of joint actions, to two recently presented deep MARL algorithms, MADDPG and IA2C, and compare these instantiations to another recent technique that leverages action anonymity, viz., mean-field MARL. We show that our instantiations can learn the optimal behavior in a broader class of agent networks than the mean-field method, using a recently introduced pragmatic domain.

agent, artificial intelligence, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2106.09825

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Recurrent Sum-Product-Max Networks for Decision Making in Perfectly-Observed Environments

Tatavarti, Hari Teja, Doshi, Prashant, Hayes, Layton

arXiv.org Artificial IntelligenceJun-12-2020

Recent investigations into sum-product-max networks (SPMN) that generalize sum-product networks (SPN) offer a data-driven alternative for decision making, which has predominantly relied on handcrafted models. SPMNs computationally represent a probabilistic decision-making problem whose solution scales linearly in the size of the network. However, SPMNs are not well suited for sequential decision making over multiple time steps. In this paper, we present recurrent SPMNs (RSPMN) that learn from and model decision-making data over time. RSPMNs utilize a template network that is unfolded as needed depending on the length of the data sequence. This is significant as RSPMNs not only inherit the benefits of SPMNs in being data driven and mostly tractable, they are also well suited for sequential problems. We establish conditions on the template network, which guarantee that the resulting SPMN is valid, and present a structure learning algorithm to learn a sound template network. We demonstrate that the RSPMNs learned on a testbed of sequential decision-making data sets generate MEUs and policies that are close to the optimal on perfectly-observed domains. They easily improve on a recent batch-constrained reinforcement learning method, which is important because RSPMNs offer a new model-based approach to offline reinforcement learning.

artificial intelligence, node, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2006.073

Genre:

Workflow (0.69)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks

Kalra, Agastya, Rashwan, Abdullah, Hsu, Wei-Shou, Poupart, Pascal, Doshi, Prashant, Trimponias, Georgios

Neural Information Processing SystemsDec-31-2018

Sum-product networks have recently emerged as an attractive representation due to their dual view as a special type of deep neural network with clear semantics and a special type of probabilistic graphical model for which inference is always tractable. Those properties follow from some conditions (i.e., completeness and decomposability) that must be respected by the structure of the network. As a result, it is not easy to specify a valid sum-product network by hand and therefore structure learning techniques are typically used in practice. This paper describes a new online structure learning technique for feed-forward and recurrent SPNs. The algorithm is demonstrated on real-world datasets with continuous features for which it is not clear what network architecture might be best, including sequence datasets of varying length.

deep learning, neural network, node, (21 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.14)

Genre: Instructional Material > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks

Kalra, Agastya, Rashwan, Abdullah, Hsu, Wei-Shou, Poupart, Pascal, Doshi, Prashant, Trimponias, Georgios

Neural Information Processing SystemsDec-31-2018

Sum-product networks have recently emerged as an attractive representation due to their dual view as a special type of deep neural network with clear semantics and a special type of probabilistic graphical model for which inference is always tractable. Those properties follow from some conditions (i.e., completeness and decomposability) that must be respected by the structure of the network. As a result, it is not easy to specify a valid sum-product network by hand and therefore structure learning techniques are typically used in practice. This paper describes a new online structure learning technique for feed-forward and recurrent SPNs. The algorithm is demonstrated on real-world datasets with continuous features for which it is not clear what network architecture might be best, including sequence datasets of varying length.

deep learning, neural network, node, (21 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario (0.14)

Genre: Instructional Material > Online (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress

Arora, Saurabh, Doshi, Prashant

arXiv.org Machine LearningJun-18-2018

Inverse reinforcement learning is the problem of inferring the reward function of an observed agent, given its policy or behavior. Researchers perceive IRL both as a problem and as a class of methods. By categorically surveying the current literature in IRL, this article serves as a reference for researchers and practitioners in machine learning to understand the challenges of IRL and select the approaches best suited for the problem on hand. The survey formally introduces the IRL problem along with its central challenges which include accurate inference, generalizability, correctness of prior knowledge, and growth in solution complexity with problem size. The article elaborates how the current methods mitigate these challenges. We further discuss the extensions of traditional IRL methods: (i) inaccurate and incomplete perception, (ii) incomplete model, (iii) multiple rewards, and (iv) non-linear reward functions. This discussion concludes with some broad advances in the research area and currently open research questions.

ground transportation, reward function, survey article, (22 more...)

arXiv.org Machine Learning

1806.06877

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Industry:

Transportation (0.68)
Automobiles & Trucks (0.46)

Add feedback

Reinforcement Learning for Heterogeneous Teams with PALO Bounds

Ceren, Roi, Doshi, Prashant, He, Keyang

arXiv.org Artificial IntelligenceMay-23-2018

We introduce reinforcement learning for heterogeneous teams in which rewards for an agent are additively factored into local costs, stimuli unique to each agent, and global rewards, those shared by all agents in the domain. Motivating domains include coordination of varied robotic platforms, which incur different costs for the same action, but share an overall goal. We present two templates for learning in this setting with factored rewards: a generalization of Perkins' Monte Carlo exploring starts for POMDPs to canonical MPOMDPs, with a single policy mapping joint observations of all agents to joint actions (MCES-MP); and another with each agent individually mapping joint observations to their own action (MCES-FMP). We use probably approximately local optimal (PALO) bounds to analyze sample complexity, instantiating these templates to PALO learning. We promote sample efficiency by including a policy space pruning technique, and evaluate the approaches on three domains of heterogeneous agents demonstrating that MCES-FMP yields improved policies in less samples compared to MCES-MP and a previous benchmark.

agent, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1805.09267

Country: North America > United States > Georgia > Clarke County > Athens (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

A Framework and Method for Online Inverse Reinforcement Learning

Arora, Saurabh, Doshi, Prashant, Banerjee, Bikramjit

arXiv.org Artificial IntelligenceMay-20-2018

Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from the observations of its behavior on a task. While this problem has been well investigated, the related problem of {\em online} IRL---where the observations are incrementally accrued, yet the demands of the application often prohibit a full rerun of an IRL method---has received relatively less attention. We introduce the first formal framework for online IRL, called incremental IRL (I2RL), and a new method that advances maximum entropy IRL with hidden variables, to this setting. Our formal analysis shows that the new method has a monotonically improving performance with more demonstration data, as well as probabilistically bounded error, both under full and partial observability. Experiments in a simulated robotic application of penetrating a continuous patrol under occlusion shows the relatively improved performance and speed up of the new method and validates the utility of online IRL.

artificial intelligence, reinforcement learning, trajectory, (18 more...)

arXiv.org Artificial Intelligence

1805.07871

Country:

North America > United States > Georgia > Clarke County > Athens (0.14)
North America > United States > Mississippi > Forrest County > Hattiesburg (0.14)

Genre: Research Report (0.50)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Decision-Theoretic Planning Under Anonymity in Agent Populations

Sonu, Ekhlas, Chen, Yingke, Doshi, Prashant

Journal of Artificial Intelligence ResearchAug-29-2017

We study the problem of self-interested planning under uncertainty in settings shared with more than a thousand other agents, each of which plans at its own individual level. We refer to such large numbers of agents as an agent population. The decision-theoretic formalism of interactive partially observable Markov decision process (I-POMDP) is used to model the agent's self-interested planning. The first contribution of this article is a method for drastically scaling the finitely-nested I-POMDP to certain agent populations for the first time. Our method exploits two types of structure that is often exhibited by agent populations -- anonymity and context-specific independence. We present a variant called the many-agent I-POMDP that models both these types of structure to plan efficiently under uncertainty in multiagent settings. In particular, the complexity of the belief update and solution in the many-agent I-POMDP is polynomial in the number of agents compared with the exponential growth that challenges the original framework. While exploiting structure helps mitigate the curse of many agents, the well-known curse of history that afflicts I-POMDPs continues to challenge scalability in terms of the planning horizon. The second contribution of this article is an application of the branch-and-bound scheme to reduce the exponential growth of the search tree for look ahead. For this, we introduce new fast-computing upper and lower bounds for the exact value function of the many-agent I-POMDP. This speeds up the look-ahead computations without trading off optimality, and reduces both memory and run time complexity. The third contribution is a comprehensive empirical evaluation of the methods on three new problems domains -- policing large protests, controlling traffic congestion at a busy intersection, and improving the AI for the popular Clash of Clans multiplayer game. We demonstrate the feasibility of exact self-interested planning in these large problems, and that our methods for speeding up the planning are effective. Altogether, these contributions represent a principled and significant advance toward moving self-interested planning under uncertainty to real-world applications.

artificial intelligence, ground transportation, machine learning, (18 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.5449

AI Access Foundation

11077

Journal of Artificial Intelligence Research

Country:

North America > United States > Georgia > Clarke County > Athens (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre: Overview (0.47)

Industry:

Leisure & Entertainment > Games (1.00)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Freeway Merging in Congested Traffic based on Multipolicy Decision Making with Passive Actor Critic

Nishi, Tomoki, Doshi, Prashant, Prokhorov, Danil

arXiv.org Artificial IntelligenceJul-14-2017

Freeway merging in congested traffic is a significant challenge toward fully automated driving. Merging vehicles need to decide not only how to merge into a spot, but also where to merge. We present a method for the freeway merging based on multi-policy decision making with a reinforcement learning method called {\em passive actor-critic} (pAC), which learns with less knowledge of the system and without active exploration. The method selects a merging spot candidate by using the state value learned with pAC. We evaluate our method using real traffic data. Our experiments show that pAC achieves 92\% success rate to merge into a freeway, which is comparable to human decision making.

artificial intelligence, ground transportation, vehicle, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIV.2019.2904417

1707.04489

Country: North America > United States > Georgia > Clarke County > Athens (0.14)

Genre: Research Report (0.40)

Industry:

Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

Doshi, Prashant

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

A Hierarchical Bayesian model for Inverse RL in Partially-Controlled Environments

Many Agent Reinforcement Learning Under Partial Observability

Recurrent Sum-Product-Max Networks for Decision Making in Perfectly-Observed Environments

Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks

Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks

A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress

Reinforcement Learning for Heterogeneous Teams with PALO Bounds

A Framework and Method for Online Inverse Reinforcement Learning

Decision-Theoretic Planning Under Anonymity in Agent Populations

Freeway Merging in Congested Traffic based on Multipolicy Decision Making with Passive Actor Critic