Doshi, Prashant

Reinforcement Learning for Heterogeneous Teams with PALO Bounds Artificial Intelligence

We introduce reinforcement learning for heterogeneous teams in which rewards for an agent are additively factored into local costs, stimuli unique to each agent, and global rewards, those shared by all agents in the domain. Motivating domains include coordination of varied robotic platforms, which incur different costs for the same action, but share an overall goal. We present two templates for learning in this setting with factored rewards: a generalization of Perkins' Monte Carlo exploring starts for POMDPs to canonical MPOMDPs, with a single policy mapping joint observations of all agents to joint actions (MCES-MP); and another with each agent individually mapping joint observations to their own action (MCES-FMP). We use probably approximately local optimal (PALO) bounds to analyze sample complexity, instantiating these templates to PALO learning. We promote sample efficiency by including a policy space pruning technique, and evaluate the approaches on three domains of heterogeneous agents demonstrating that MCES-FMP yields improved policies in less samples compared to MCES-MP and a previous benchmark.

On Markov Games Played by Bayesian and Boundedly-Rational Players

AAAI Conferences

We present a new game-theoretic framework in which Bayesian players with bounded rationality engage in a Markov game and each has private but incomplete information regarding other players' types. Instead of utilizing Harsanyi's abstract types and a common prior, we construct intentional player types whose structure is explicit and induces a {\em finite-level} belief hierarchy. We characterize an equilibrium in this game and establish the conditions for existence of the equilibrium. The computation of finding such equilibria is formalized as a constraint satisfaction problem and its effectiveness is demonstrated on two cooperative domains.

Dynamic Sum Product Networks for Tractable Inference on Sequence Data (Extended Version) Artificial Intelligence

Sum-Product Networks (SPN) have recently emerged as a new class of tractable probabilistic graphical models. Unlike Bayesian networks and Markov networks where inference may be exponential in the size of the network, inference in SPNs is in time linear in the size of the network. Since SPNs represent distributions over a fixed set of variables only, we propose dynamic sum product networks (DSPNs) as a generalization of SPNs for sequence data of varying length. A DSPN consists of a template network that is repeated as many times as needed to model data sequences of any length. We present a local search technique to learn the structure of the template network. In contrast to dynamic Bayesian networks for which inference is generally exponential in the number of variables per time slice, DSPNs inherit the linear inference complexity of SPNs. We demonstrate the advantages of DSPNs over DBNs and other models on several datasets of sequence data.

Bayesian Markov Games with Explicit Finite-Level Types

AAAI Conferences

In impromptu or ad hoc settings, participating players are precluded from precoordination. Subsequently, each player's own model is private and includes some uncertainty about the others' types or behaviors. Harsanyi's formulation of a Bayesian game lays emphasis on this uncertainty while the players each play exactly one turn. We propose a new game-theoretic framework where Bayesian players engage in a Markov game and each has private but imperfect information regarding other players' types. Consequently, we construct player types whose structure is explicit and includes a finite level belief hierarchy instead of utilizing Harsanyi's abstract types and a common prior distribution. We formalize this new framework and demonstrate its effectiveness on two standard ad hoc teamwork domains involving two or more ad hoc players.

Decision Sum-Product-Max Networks

AAAI Conferences

Sum-Product Networks (SPNs) were recently proposed as a new class of probabilistic graphical models that guarantee tractable inference, even on models with high-treewidth. In this paper, we propose a new extension to SPNs, called Decision Sum-Product-Max Networks (Decision-SPMNs), that makes SPNs suitable for discrete multi-stage decision problems. We present an algorithm that solves Decision-SPMNs in a time that is linear in the size of the network. We also present algorithms to learn the parameters of the network from data.

Bayesian Markov Games with Explicit Finite-Level Types

AAAI Conferences

We present a new game-theoretic framework where Bayesian players engage in a Markov game and each has private but imperfect information regarding other players' types. Instead of utilizing Harsanyi's abstract types and a common prior distribution, we construct player types whose structure is explicit and induces a finite level belief hierarchy. We characterize equilibria in this game and formalize the computation of finding such equilibria as a constraint satisfaction problem. The effectiveness of the new framework is demonstrated on two ad hoc team work domains.

Toward Estimating Others' Transition Models Under Occlusion for Multi-Robot IRL

AAAI Conferences

Multi-robot inverse reinforcement learning (mIRL) is broadly useful for learning, from observations, the behaviors of multiple robots executing fixed trajectories and interacting with each other. In this paper, we relax a crucial assumption in IRL to make it better suited for wider robotic applications: we allow the transition functions of other robots to be stochastic and do not assume that the transition error probabilities are known to the learner. Challenged by occlusion where large portions of others' state spaces are fully hidden, we present a new approach that maps stochastic transitions to distributions over features. Then, the underconstrained problem is solved using nonlinear optimization that maximizes entropy to learn the transition function of each robot from occluded observations. Our methods represent significant and first steps toward making mIRL pragmatic.

Individual Planning in Agent Populations: Exploiting Anonymity and Frame-Action Hypergraphs Artificial Intelligence

Interactive partially observable Markov decision processes (I-POMDP) provide a formal framework for planning for a self-interested agent in multiagent settings. An agent operating in a multiagent environment must deliberate about the actions that other agents may take and the effect these actions have on the environment and the rewards it receives. Traditional I-POMDPs model this dependence on the actions of other agents using joint action and model spaces. Therefore, the solution complexity grows exponentially with the number of agents thereby complicating scalability. In this paper, we model and extend anonymity and context-specific independence -- problem structures often present in agent populations -- for computational gain. We empirically demonstrate the efficiency from exploiting these problem structures by solving a new multiagent problem involving more than 1,000 agents.

Monte Carlo Sampling Methods for Approximating Interactive POMDPs Artificial Intelligence

Partially observable Markov decision processes (POMDPs) provide a principled framework for sequential planning in uncertain single agent settings. An extension of POMDPs to multiagent settings, called interactive POMDPs (I-POMDPs), replaces POMDP belief spaces with interactive hierarchical belief systems which represent an agent's belief about the physical world, about beliefs of other agents, and about their beliefs about others' beliefs. This modification makes the difficulties of obtaining solutions due to complexity of the belief and policy spaces even more acute. We describe a general method for obtaining approximate solutions of I-POMDPs based on particle filtering (PF). We introduce the interactive PF, which descends the levels of the interactive belief hierarchies and samples and propagates beliefs at each level. The interactive PF is able to mitigate the belief space complexity, but it does not address the policy space complexity. To mitigate the policy space complexity -- sometimes also called the curse of history -- we utilize a complementary method based on sampling likely observations while building the look ahead reachability tree. While this approach does not completely address the curse of history, it beats back the curse's impact substantially. We provide experimental results and chart future work.

Improved Convergence of Iterative Ontology Alignment using Block-Coordinate Descent

AAAI Conferences

A wealth of ontologies, many of which overlap in their scope, has made aligning ontologies an important problem for the semantic Web. Consequently, several algorithms now exist for automatically aligning ontologies, with mixed success in their performances. Crucial challenges for these algorithms involve scaling to large ontologies, and as applications of ontology alignment evolve, performing the alignment in a reasonable amount of time without compromising on the quality of the alignment. A class of alignment algorithms is iterative and often consumes more time than others while delivering solutions of high quality. We present a novel and general approach for speeding up the multivariable optimization process utilized by these algorithms. Specifically, we use the technique of block-coordinate descent in order to possibly improve the speed of convergence of the iterative alignment techniques. We integrate this approach into three well-known alignment systems and show that the enhanced systems generate similar or improved alignments in significantly less time on a comprehensive testbed of ontology pairs. This represents an important step toward making alignment techniques computationally more feasible.