Goto

Collaborating Authors

 Liu, Yun-En


Where to Add Actions in Human-in-the-Loop Reinforcement Learning

AAAI Conferences

In order for reinforcement learning systems to learn quickly in vast action spaces such as the space of all possible pieces of text or the space of all images, leveraging human intuition and creativity is key. However, a human-designed action space is likely to be initially imperfect and limited; furthermore, humans may improve at creating useful actions with practice or new information. Therefore, we propose a framework in which a human adds actions to a reinforcement learning system over time to boost performance. In this setting, however, it is key that we use human effort as efficiently as possible, and one significant danger is that humans waste effort adding actions at places (states) that aren't very important. Therefore, we propose Expected Local Improvement (ELI), an automated method which selects states at which to query humans for a new action. We evaluate ELI on a variety of simulated domains adapted from the literature, including domains with over a million actions and domains where the simulated experts change over time. We find ELI demonstrates excellent empirical performance, even in settings where the synthetic "experts" are quite poor.


Crowdsourcing Accurate and Creative Word Problems and Hints

AAAI Conferences

The current state-of-the-art method for generating educational content, such as math word problems and hints, is manual authoring by domain experts. Unfortunately, this is costly, time consuming, and produces content that lacks diversity. Attempts to automatically address the time and diversity issues through natural language generation still do not produce content that is sufficiently creative and varied. Crowdsourcing is a viable alternative - there has been a great deal of research on leveraging human creativity to solve complex problems, such as user interface design. However, these systems typically decompose complex tasks into subtasks. Writing a single word problem or hint is a small enough problem that it is unclear how to further break it down, but also far more complex than typical microtasks like image labeling. Therefore, it is not obvious how to apply these worker improvement methods or which ones are most effective (if at all). We build upon successful task design factors in prior work and run a series of iterative studies, incrementally adding different worker-support elements. Our results show that successive task designs improved accuracy and creativity.


Offline Evaluation of Online Reinforcement Learning Algorithms

AAAI Conferences

In many real-world reinforcement learning problems, we have access to an existing dataset and would like to use it to evaluate various learning approaches. Typically, one would prefer not to deploy a fixed policy, but rather an algorithm that learns to improve its behavior as it gains more experience. Therefore, we seek to evaluate how a proposed algorithm learns in our environment, meaning we need to evaluate how an algorithm would have gathered experience if it were run online. In this work, we develop three new evaluation approaches which guarantee that, given some history, algorithms are fed samples from the distribution that they would have encountered if they were run online. Additionally, we are the first to propose an approach that is provably unbiased given finite data, eliminating bias due to the length of the evaluation. Finally, we compare the sample-efficiency of these approaches on multiple datasets, including one from a real-world deployment of an educational game.


The Queue Method: Handling Delay, Heuristics, Prior Data, and Evaluation in Bandits

AAAI Conferences

Current algorithms for the standard multi-armed bandit problem have good empirical performance and optimal regret bounds. However, real-world problems often differ from the standard formulation in several ways. First, feedback may be delayed instead of arriving immediately. Second, the real world often contains structure which suggests heuristics, which we wish to incorporate while retaining the best-known theoretical guarantees. Third, we may wish to make use of an arbitrary prior dataset without negatively impacting performance. Fourth, we may wish to efficiently evaluate algorithms using a previously collected dataset. Surprisingly, these seemingly-disparate problems can be addressed using algorithms inspired by a recently-developed queueing technique. We present the Stochastic Delayed Bandits (SDB) algorithm as a solution to these four problems, which takes black-box bandit algorithms (including heuristic approaches) as input while achieving good theoretical guarantees. We present empirical results from both synthetic simulations and real-world data drawn from an educational game. Our results show that SDB outperforms state-of-the-art approaches to handling delay, heuristics, prior data, and evaluation.


Evaluating Competitive Game Balance with Restricted Play

AAAI Conferences

Game balancing is the fine-tuning phase in which a functioning game is adjusted to be deep, fair, and interesting. Balancing is difficult and time-consuming, as designers must repeatedly tweak parameters, and run lengthy playtests to evaluate the effects of these changes. If designers could receive immediate feedback on their designs, they could explore a vast space of variations, and select only the most promising games for playtesting. Such automated design feedback has been difficult to achieve, as there is no mathematical formulation of game balance that unifies many of its forms. We argue for a formulation in which carefully restricted agents are played against standard agents. We develop this restricted-play balance framework, and evaluate its utility by building a tool capable of calculating measures of balance for a large family of games. By applying this tool to an educational card game, we demonstrate how the framework and tool allow designers to rapidly evaluate and iterate on the balance of their games.