Goto

Collaborating Authors

 Search


Monte-Carlo Planning in Large POMDPs

Neural Information Processing Systems

This paper introduces a Monte-Carlo algorithm for online planning in large POMDPs. The algorithm combines a Monte-Carlo update of the agent's belief state with a Monte-Carlo tree search from the current belief state. The new algorithm, POMCP, has two important properties. First, Monte-Carlo sampling is used to break the curse of dimensionality both during belief state updates and during planning. Second, only a black box simulator of the POMDP is required, rather than explicit probability distributions. These properties enable POMCP to plan effectively in significantly larger POMDPs than has previously been possible. We demonstrate its effectiveness in three large POMDPs. We scale up a well-known benchmark problem, Rocksample, by several orders of magnitude. We also introduce two challenging new POMDPs: 10x10 Battleship and Partially Observable PacMan, with approximately 10^18 and 10^56 states respectively. Our Monte-Carlo planning algorithm achieved a high level of performance with no prior knowledge, and was also able to exploit simple domain knowledge to achieve better results with less search. POMCP is the first general purpose planner to achieve high performance in such large and unfactored POMDPs.


On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient

Neural Information Processing Systems

Likelihood ratio policy gradient methods have been some of the most successful reinforcement learning algorithms, especially for learning on physical systems. We describe how the likelihood ratio policy gradient can be derived from an importance sampling perspective. This derivation highlights how likelihood ratio methods under-use past experience by (a) using the past experience to estimate {\em only} the gradient of the expected return $U(\theta)$ at the current policy parameterization $\theta$, rather than to obtain a more complete estimate of $U(\theta)$, and (b) using past experience under the current policy {\em only} rather than using all past experience to improve the estimates. We present a new policy search method, which leverages both of these observations as well as generalized baselines---a new technique which generalizes commonly used baseline techniques for policy gradient methods. Our algorithm outperforms standard likelihood ratio policy gradient algorithms on several testbeds.


Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

Neural Information Processing Systems

We consider the problem of retrieving the database points nearest to a given {\em hyperplane} query without exhaustively scanning the database. We propose two hashing-based solutions. Our first approach maps the data to two-bit binary keys that are locality-sensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the Euclidean norm reflects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sub-linear time. Our first method's preprocessing stage is more efficient, while the second has stronger accuracy guarantees. We apply both to pool-based active learning: taking the current hyperplane classifier as a query, our algorithm identifies those points (approximately) satisfying the well-known minimal distance-to-hyperplane selection criterion. We empirically demonstrate our methods' tradeoffs, and show that they make it practical to perform active selection with millions of unlabeled points.


A Monte Carlo AIXI Approximation

arXiv.org Artificial Intelligence

This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. Our approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a new Monte-Carlo Tree Search algorithm along with an agent-specific extension to the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a variety of stochastic and partially observable domains. We conclude by proposing a number of directions for future research.


Best-First Heuristic Search for Multicore Machines

Journal of Artificial Intelligence Research

To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we compare different approaches to parallel best-first search in a shared-memory setting. We present a new method, PBNF, that uses abstraction to partition the state space and to detect duplicate states without requiring frequent locking. PBNF allows speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions in our approach, proving its correctness using temporal logic. Our approach is general, allowing it to extend easily to suboptimal and anytime heuristic search. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using 8-core machines, we show that A*, weighted A* and Anytime weighted A* implemented using PBNF yield faster search than improved versions of previous parallel search proposals.


Distributed Graph Coloring: An Approach Based on the Calling Behavior of Japanese Tree Frogs

arXiv.org Artificial Intelligence

Graph coloring, also known as vertex coloring, considers the problem of assigning colors to the nodes of a graph such that adjacent nodes do not share the same color. The optimization version of the problem concerns the minimization of the number of used colors. In this paper we deal with the problem of finding valid colorings of graphs in a distributed way, that is, by means of an algorithm that only uses local information for deciding the color of the nodes. Such algorithms prescind from any central control. Due to the fact that quite a few practical applications require to find colorings in a distributed way, the interest in distributed algorithms for graph coloring has been growing during the last decade. As an example consider wireless ad-hoc and sensor networks, where tasks such as the assignment of frequencies or the assignment of TDMA slots are strongly related to graph coloring. The algorithm proposed in this paper is inspired by the calling behavior of Japanese tree frogs. Male frogs use their calls to attract females. Interestingly, groups of males that are located nearby each other desynchronize their calls. This is because female frogs are only able to correctly localize the male frogs when their calls are not too close in time. We experimentally show that our algorithm is very competitive with the current state of the art, using different sets of problem instances and comparing to one of the most competitive algorithms from the literature.


A Utility-Theoretic Approach to Privacy in Online Services

Journal of Artificial Intelligence Research

Online offerings such as web search, news portals, and e-commerce applications face the challenge of providing high-quality service to a large, heterogeneous user base. Recent efforts have highlighted the potential to improve performance by introducing methods to personalize services based on special knowledge about users and their context. For example, a user's demographics, location, and past search and browsing may be useful in enhancing the results offered in response to web search queries. However, reasonable concerns about privacy by both users, providers, and government agencies acting on behalf of citizens, may limit access by services to such information. We introduce and explore an economics of privacy in personalization, where people can opt to share personal information, in a standing or on-demand manner, in return for expected enhancements in the quality of an online service. We focus on the example of web search and formulate realistic objective functions for search efficacy and privacy. We demonstrate how we can find a provably near-optimal optimization of the utility-privacy tradeoff in an efficient manner. We evaluate our methodology on data drawn from a log of the search activity of volunteer participants. We separately assess users preferences about privacy and utility via a large-scale survey, aimed at eliciting preferences about peoples willingness to trade the sharing of personal data in returns for gains in search efficiency. We show that a significant level of personalization can be achieved using a relatively small amount of information about users.


Evolutionary Robustness Checking in the Artificial Anasazi Model

AAAI Conferences

Using the well-known Artificial Anasazi simulation for a case study, we investigate the use of genetic algorithms (GAs) for performing two common tasks related to robustness checking of agent-based models: parameter calibration and sensitivity analysis. In the calibration task, we demonstrate that a GA approach is able to find parameters that are equally good or better at minimizing error versus historical data, compared to a previous factorial grid-based approach. The GA approach also allows us to explore a wider range of parameters and parameter settings. Previous univariate sensitivity analysis on the Artificial Anasazi model did not consider potentially complex/nonlinear interactions between parameters. With the GA-based approach, we perform multivariate sensitivity analysis to discover how greatly the model can diverge from historical data, while the parameters are constrained within a close range of previously calibrated values. We show that by varying multiple parameters within a 10% range, the model can produce dramatically and qualitatively different results, and further demonstrate the utility of sensitivity analysis for model testing, by the discovery of a small coding error. Through this case study, we discuss some of the issues that can arise with calibration and sensitivity analysis of agent-based models.


A Partial Taxonomy of Substitutability and Interchangeability

arXiv.org Artificial Intelligence

Substitutability, interchangeability and related concepts in Constraint Programming were introduced approximately twenty years ago and have given rise to considerable subsequent research. We survey this work, classify, and relate the different concepts, and indicate directions for future work, in particular with respect to making connections with research into symmetry breaking. This paper is a condensed version of a larger work in progress.


A Monte Carlo Approach for Football Play Generation

AAAI Conferences

Learning effective policies in multi-agent adversarial games is a significant challenge since the search space can be prohibitively large when the actions of all the agents are considered simultaneously. Recent advances in Monte Carlo search methods have produced good results in single-agent games like Go with very large search spaces. In this paper, we propose a variation on the Monte Carlo method, UCT (Upper Confidence Bound Trees), for multi-agent, continuous-valued, adversarial games and demonstrate its utility at generating American football plays for Rush Football 2008. In football, like in many other multi-agent games, the actions of all of the agents are not equally crucial to gameplay success. By automatically identifying key players from historical game play, we can focus the UCT search on player groupings that have the largest impact on yardage gains in a particular formation.