Gibson, Richard (University of Alberta) | Lanctot, Marc (University of Alberta) | Burch, Neil (University of Alberta) | Szafron, Duane (University of Alberta) | Bowling, Michael (University of Alberta)
In large extensive form games with imperfect information, Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing approximate Nash equilibria. While the base algorithm performs a full tree traversal on each iteration, Monte Carlo CFR (MCCFR) reduces the per iteration time cost by traversing just a sampled portion of the tree. On the other hand, MCCFR's sampled values introduce variance, and the effects of this variance were previously unknown. In this paper, we generalize MCCFR by considering any generic estimator of the sought values. We show that any choice of an estimator can be used to probabilistically minimize regret, provided the estimator is bounded and unbiased. In addition, we relate the variance of the estimator to the convergence rate of an algorithm that calculates regret directly from the estimator. We demonstrate the application of our analysis by defining a new bounded, unbiased estimator with empirically lower variance than MCCFR estimates. Finally, we use this estimator in a new sampling algorithm to compute approximate equilibria in Goofspiel, Bluff, and Texas hold'em poker. Under each of our selected sampling schemes, our new algorithm converges faster than MCCFR.
Reflections allow us to inspect and modify a program's structure at its own runtime. In this article we will be looking at some parts of the go reflect package API and applying them to a real-world use case by building a generic application configuration mechanism. We have implemented a Go data analysis application which will take data from an inventory database of a bookshop, process it, and turn it into human-readable statistical models that reflect the status of our inventory. The output could, for example, be a list of books that were published by a certain author, or a bar chart depicting the number of books in our inventory per decade of publishing. What we want is a way for us to configure what the application generates from the raw data in an abstract manner. There might be hundreds of algorithms that all process the raw data in different ways and produce different outputs, but not all outputs are relevant to us every time we run the application. We want to be able to configure our application to suit our needs and then just let it do what it has to do in order to deliver what we want from it.
Planning with temporally extended goals and uncontrollable events has recently been introduced as a formal model for system reconfiguration problems. An important application is to automatically reconfigure a real-life system in such a way that its subsequent internal evolution is consistent with a temporal goal formula. In this paper we introduce an incremental search algorithm and a search-guidance heuristic, two generic planning enhancements. An initial problem is decomposed into a series of subproblems, providing two main ways of speeding up a search. Firstly, a subproblem focuses on a part of the initial goal. Secondly, a notion of action relevance allows to explore with higher priority actions that are heuristically considered to be more relevant to the subproblem at hand. Even though our techniques are more generally applicable, we restrict our attention to planning with temporally extended goals and uncontrollable events. Our ideas are implemented on top of a successful previous system that performs online learning to better guide planning and to safely avoid potentially expensive searches. In experiments, the system speed performance is further improved by a convincing margin.
This paper introduces an innovative approach for automated negotiating using the gender of human opponents. Our approach segments the information acquired from previous opponents, stores it in two databases, and models the typical behavior of males and of females. The two models are used in order to match an optimal strategy to each of the two subpopulations. In addition to the basic separation, we propose a learning algorithm which supplies an online indicator for the gender separability-level of the population, which tunes the level of separation the algorithm activates. The algorithm we present can be generally applied in different environments with no need for configuration of parameters. Experiments in 4 different one-shot domains, comparing the performance of the gender based separation approach with a basic approach which is not gender sensitive, revealed higher payoffs of the former in almost all the domains. Moreover, using the proposed learning algorithm further improved the results.
This paper presents a new approach which addresses the issue of inconsistent message exchange during agent interactions. We advocate that in agent-based service-oriented computing systems, only agents should be in charge of executing interactions. We also require that the architecture of an agent be clearly separated in two distinct parts, a public and a private parts. The public part contains the interaction model, while any other data and process the agent needs belong to the private part. Our solution consists of automatically constructing the interaction model. It is based on a unification of the actions, required of an agent playing a role in a generic protocol, and the functionalities abstracted from the BPEL4WS model of this agent. We present the algorithms to perform this unification as well as the abstract models they manipulate.