Goto

Collaborating Authors

 University of Massachusetts, Amherst



Imagination Machines: A New Challenge for Artificial Intelligence

AAAI Conferences

The aim of this paper is to propose a new overarching challenge for AI: the design of imagination machines. Imagination has been defined as the capacity to mentally transcend time, place, and/or circumstance. Much of the success of AI currently comes from a revolution in data science, specifically the use of deep learning neural networks to extract structure from data. This paper argues for the development of a new field called imagination science, which extends data science beyond its current realm of learning probability distributions from samples. Numerous examples are given in the paper to illustrate that human achievements in the arts, literature, poetry, and science may lie beyond the realm of data science, because they require abilities that go beyond finding correlations: for example, generating samples from a novel probability distribution different from the one given during training; causal reasoning to uncover interpretable explanations; or analogical reasoning to generalize to novel situations (e.g., imagination in art, representing alien life in a distant galaxy, understanding a story about talking animals, or inventing representations to model the large-scale structure of the universe). We describe the key challenges in automating imagination, discuss connections between ongoing research and imagination, and outline why automation of imagination provides a powerful launching pad for transforming AI.


Multi-Objective POMDPs with Lexicographic Reward Preferences

AAAI Conferences

We propose a model, Lexicographic Partially Observable Markov Decision Process (LPOMDP), which extends POMDPs with lexicographic preferences over multiple value functions. It allows for slack--slightly less-than-optimal values--for higher-priority preferences to facilitate improvement in lower-priority value functions. Many real life situations are naturally captured by LPOMDPs with slack. We consider a semi-autonomous driving scenario in which time spent on the road is minimized, while maximizing time spent driving autonomously. We propose two solutions to LPOMDPs--Lexicographic Value Iteration (LVI) and Lexicographic Point-Based Value Iteration (LPBVI), establishing convergence results and correctness within strong slack bounds. We test the algorithms using real-world road data provided by Open Street Map (OSM) within 10 major cities. Finally, we present GPU-based optimizations for point-based solvers, demonstrating that their application enables us to quickly solve vastly larger LPOMDPs and other variations of POMDPs.


Compositional Vector Space Models for Knowledge Base Inference

AAAI Conferences

Traditional approaches to knowledge base completion have been based on symbolic representations. Low-dimensional vector embedding models proposed recently for this task are attractive since they generalize to possibly unlimited sets of relations. A significant draw- back of previous embedding models for KB completion is that they merely support reasoning on individual relations (e.g., bornIn ( X, Y ) โ‡’ nationality ( X, Y ) ). In this work, we develop models for KB completion that support chains of reasoning on paths of any length using compositional vector space models. We construct compositional vector representations for the paths in the KB graph from the semantic vector representations of the binary relations in that path and perform inference directly in the vector space. Unlike previous methods, our approach can generalize to paths that are unseen in training and, in a zero-shot setting, predict target relations without supervised training data for that relation.


High-Confidence Off-Policy Evaluation

AAAI Conferences

Many reinforcement learning algorithms use trajectories collected from the execution of one or more policies to propose a new policy. Because execution of a bad policy can be costly or dangerous, techniques for evaluating the performance of the new policy without requiring its execution have been of recent interest in industry. Such off-policy evaluation methods, which estimate the performance of a policy using trajectories collected from the execution of other policies, heretofore have not provided confidences regarding the accuracy of their estimates. In this paper we propose an off-policy method for computing a lower confidence bound on the expected return of a policy.


An Application of Multiagent Learning in Highly Dynamic Environments

AAAI Conferences

We explore the emergent behavior of game theoretic algorithms in a highly dynamic applied setting in which the optimal goal for the agents is constantly changing. Our focus is on a variant of the traditional predator-prey problem entitled Defender. Consisting of multiple predators and multiple prey, Defender shares similarities with rugby, soccer, and football, in addition to current problems in the field of Multiagent Systems (MAS). Observations, communications, and knowledge about the world-state are designed to be information-sparse, modeling real-world uncertainty. We propose a solution to Defender by means of the well-known multiagent learning algorithm fictitious play, and compare it with rational learning, regret matching, minimax regret, and a simple greedy strategy. We provide the modifications required to build these agents and state the implications of their application of them to our problem. We show fictitious play's performance to be superior at evenly assigning predators to prey in spite of it being an incomplete and imperfect information game that is continually changing its dimension and payoff. Interestingly, its performance is attributed to a synthesis of fictitious play, partial observability, and an anti-coordination game which reinforces the payoff of actions that were previously taken.


A Distributed Communication Architecture for Dynamic Multiagent Systems

AAAI Conferences

We investigate the problem of creating a robust, rapidly converging, Distributed Communication Architecture (DCA) for the domain of low bandwidth, single channel Multiagent Systems (MAS) in which agents may drop in and out of communication without prior notification. There are only three capability-based assumptions made by the algorithm: 1) agents can classify a signal's message as either noise, silence, or clarity, 2) agents can classify their own messages, and 3) agents can understand one another to some degree. The final structure allows agents to communicate in a round-robin manner without any centralized or hierarchical control. We evaluate DCA's the convergence rate through four distinct experiments, including both a worst-case scenario that consists of all agents starting simultaneously and a more common-case scenario in which agents offset their starting times. We examine effective throughput as the average number of clearly sent messages in a cycle to determine the amount of information successfully communicated. Lastly, we emulate situations found in problems with moving agents to show that agents who incorporate local observations can improve both their convergence rates and throughput.


Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs

AAAI Conferences

Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multi-arm bandit DCOP algorithm on dynamic DCOPs.


Approximate Planning for Factored POMDPs

AAAI Conferences

We describe an approximate dynamic programming algorithm for partially observable Markov decision processes represented in factored form. Two complementary forms of approximation are used to simplify a piecewise linear and convex value function, where each linear facet of the function is represented compactly by an algebraic decision diagram. ln one form of approximation, the degree of state abstraction is increased by aggregating states with similar values. In the second form of approximation, the value function is simplified by removing linear facets that contribute marginally to value. We derive an error bound that applies to both forms of approximation. Experimental results show that this approach improves the performance of dynamic programming and extends the range of problems it can solve.


Strategy Mining

AAAI Conferences

Strategy mining is a new area of research about discovering strategies for decision-making. It is motivated by how similarity is assessed in retrospect in law. In the legal domain, when both case facts and court decisions are present, it is often useful to assess similarity by accounting for both case facts and case outcomes. In this paper, we formulate the strategy mining problem as a clustering problem with the goal of finding clusters that represent disparate conditional dependency of decision labels on other features. Existing clustering algorithms are inappropriate to cluster dependency because they either assume feature independence, such as K-means, or only consider the co-occurrence of features without explicitly modeling the special dependency of the decision label on other features, such as Latent Dirichlet Allocation (LDA). We propose an Expectation Maximization (EM) style unsupervised learning algorithm for dependency clustering. Like EM, our algorithm is grounded in statistical learning theory. It minimizes the empirical risk of decision tree learning. Unlike other clustering algorithms, our algorithm is irrelevant-feature resistant, and its learned clusters modeled by decision trees are strongly interpretable and predictive. We systematically evaluate both the convergence property and solution quality of our algorithm using a common law dataset comprised of actual cases. Experimental results show that our algorithm significantly outperforms K-means and LDA on clustering dependency