Stone, Peter


Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

arXiv.org Artificial Intelligence

For an autonomous agent, executing a poor policy may be costly or even dangerous. For such agents, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. Current methods for exact high confidence off-policy evaluation that use importance sampling require a substantial amount of data to achieve a tight lower bound. Existing model-based methods only address the problem in discrete state spaces. Since exact bounds are intractable for many domains we trade off strict guarantees of safety for more data-efficient approximate bounds. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces. Since direct use of a model may introduce bias, we derive a theoretical upper bound on model bias for when the model transition function is estimated with i.i.d. trajectories. This bound broadens our understanding of the conditions under which model-based methods have high bias. Finally, we empirically evaluate our proposed methods and analyze the settings in which different bootstrapping off-policy confidence interval methods succeed and fail.


Automated Design of Robust Mechanisms

AAAI Conferences

We introduce a new class of mechanisms, robust mechanisms, that is an intermediary between ex-post mechanisms and Bayesian mechanisms. This new class of mechanisms allows the mechanism designer to incorporate imprecise estimates of the distribution over bidder valuations in a way that provides strong guarantees that the mechanism will perform at least as well as ex-post mechanisms, while in many cases performing better. We further extend this class to mechanisms that are with high probability incentive compatible and individually rational, ε-robust mechanisms. Using techniques from automated mechanism design and robust optimization, we provide an algorithm polynomial in the number of bidder types to design robust and ε-robust mechanisms. We show experimentally that this new class of mechanisms can significantly outperform traditional mechanism design techniques when the mechanism designer has an estimate of the distribution and the bidder’s valuation is correlated with an externally verifiable signal.


Autonomous Electricity Trading Using Time-of-Use Tariffs in a Competitive Market

AAAI Conferences

This paper studies the impact of Time-Of-Use (TOU) tariffs in a competitive electricity market place. Specifically, it focuses on the question of how should an autonomous broker agent optimize TOU tariffs in a competitive retail market, and what is the impact of such tariffs on the economy. We formalize the problem of TOU tariff optimization and propose an algorithm for approximating its solution. We extensively experiment with our algorithm in a large-scale, detailed electricity retail markets simulation of the Power Trading Agent Competition (Power TAC) and: 1) find that our algorithm results in 15% peak-demand reduction, 2) find that its peak-flattening results in greater profit and/or profit-share for the broker and allows it to win against the 1st and 2nd place brokers from the Power TAC 2014 finals, and 3) analyze several economic implications of using TOU tariffs in competitive retail markets.


When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing

AAAI Conferences

Building on the successful applications of Stackelberg Security Games (SSGs) to protect infrastructure, researchers have begun focusing on applying game theory to green security domains such as protection of endangered animals and fish stocks. Previous efforts in these domains optimize defender strategies based on the standard Stackelberg assumption that the adversaries become fully aware of the defender's strategy before taking action. Unfortunately, this assumption is inappropriate since adversaries in green security domains often lack the resources to fully track the defender strategy. This paper (i) introduces Green Security Games (GSGs), a novel game model for green security domains with a generalized Stackelberg assumption; (ii) provides algorithms to plan effective sequential defender strategies --- such planning was absent in previous work; (iii) proposes a novel approach to learn adversary models that further improves defender performance; and (iv) provides detailed experimental analysis of proposed approaches.


Representative Selection in Non Metric Datasets

arXiv.org Artificial Intelligence

This paper considers the problem of representative selection: choosing a subset of data points from a dataset that best represents its overall set of elements. This subset needs to inherently reflect the type of information contained in the entire set, while minimizing redundancy. For such purposes, clustering may seem like a natural approach. However, existing clustering methods are not ideally suited for representative selection, especially when dealing with non-metric data, where only a pairwise similarity measure exists. In this paper we propose $\delta$-medoids, a novel approach that can be viewed as an extension to the $k$-medoids algorithm and is specifically suited for sample representative selection from non-metric data. We empirically validate $\delta$-medoids in two domains, namely music analysis and motion analysis. We also show some theoretical bounds on the performance of $\delta$-medoids and the hardness of representative selection in general.


UT Austin Villa 2014: RoboCup 3D Simulation League Champion via Overlapping Layered Learning

AAAI Conferences

Layered learning is a hierarchical machine learning paradigm that enables learning of complex behaviors by incrementally learning a series of sub-behaviors. A key feature of layered learning is that higher layers directly depend on the learned lower layers. In its original formulation, lower layers were frozen prior to learning higher layers. This paper considers an extension to the paradigm that allows learning certain behaviors independently, and then later stitching them together by learning at the "seams" where their influences overlap. The UT Austin Villa 2014 RoboCup 3D simulation team, using such overlapping layered learning, learned a total of 19 layered behaviors for a simulated soccer-playing robot, organized both in series and in parallel. To the best of our knowledge this is more than three times the number of layered behaviors in any prior layered learning system. Furthermore, the complete learning process is repeated on four different robot body types, showcasing its generality as a paradigm for efficient behavior learning. The resulting team won the RoboCup 2014 championship with an undefeated record, scoring 52 goals and conceding none. This paper includes a detailed experimental analysis of the team's performance and the overlapping layered learning approach that led to its success.


Cooperating with Unknown Teammates in Complex Domains: A Robot Soccer Case Study of Ad Hoc Teamwork

AAAI Conferences

Many scenarios require that robots work together as a team in order to effectively accomplish their tasks. However, pre-coordinating these teams may not always be possible given the growing number of companies and research labs creating these robots. Therefore, it is desirable for robots to be able to reason about ad hoc teamwork and adapt to new teammates on the fly. Past research on ad hoc teamwork has focused on relatively simple domains, but this paper demonstrates that agents can reason about ad hoc teamwork in complex scenarios. To handle these complex scenarios, we introduce a new algorithm, PLASTIC–Policy, that builds on an existing ad hoc teamwork approach. Specifically, PLASTIC– Policy learns policies to cooperate with past teammates and reuses these policies to quickly adapt to new teammates. This approach is tested in the 2D simulation soccer league of RoboCup using the half field offense task.


SCRAM: Scalable Collision-avoiding Role Assignment with Minimal-Makespan for Formational Positioning

AAAI Conferences

Teams of mobile robots often need to divide up subtasks efficiently. In spatial domains, a key criterion for doing so may depend on distances between robots and the subtasks' locations. This paper considers a specific such criterion, namely how to assign interchangeable robots, represented as point masses, to a set of target goal locations within an open two dimensional space such that the makespan (time for all robots to reach their target locations) is minimized while also preventing collisions among robots. We present scaleable (computable in polynomial time) role assignment algorithms that we classify as being SCRAM (Scalable Collision-avoiding Role Assignment with Minimal-makespan). SCRAM role assignment algorithms use a graph theoretic approach to map agents to target goal locations such that our objectives for both minimizing the makespan and avoiding agent collisions are met. A system using SCRAM role assignment was originally designed to allow for decentralized coordination among physically realistic simulated humanoid soccer playing robots in the partially observable, non-deterministic, noisy, dynamic, and limited communication setting of the RoboCup 3D simulation league. In its current form, SCRAM role assignment generalizes well to many realistic and real-world multiagent systems, and scales to thousands of agents.


RoboCup Soccer Leagues

AI Magazine

RoboCup was created in 1996 by a group of Japanese, American, and European artificial intelligence and robotics researchers with a formidable, visionary long-term challenge: By 2050 a team of robot soccer players will beat the human World Cup champion team. In this article, we focus on RoboCup robot soccer, and present its five current leagues, which address complementary scientific challenges through different robot and physical setups. Full details on the status of the RoboCup soccer leagues, including league history and past results, upcoming competitions, and detailed rules and specifications are available from the league homepages and wikis.


RoboCup Soccer Leagues

AI Magazine

RoboCup was created in 1996 by a group of Japanese, American, and European artificial intelligence and robotics researchers with a formidable, visionary long-term challenge: By 2050 a team of robot soccer players will beat the human World Cup champion team. In this article, we focus on RoboCup robot soccer, and present its five current leagues, which address complementary scientific challenges through different robot and physical setups. Full details on the status of the RoboCup soccer leagues, including league history and past results, upcoming competitions, and detailed rules and specifications are available from the league homepages and wikis.