Andre, David


Model-Based Bayesian Exploration

arXiv.org Artificial Intelligence

Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information - the expected improvement in future decision quality arising from the information acquired by exploration. Estimating this quantity requires an assessment of the agent's uncertainty about its current value estimates for states. In this paper we investigate ways of representing and reasoning about this uncertainty in algorithms where the system attempts to learn a model of its environment. We explicitly represent uncertainty about the parameters of the model and build probability distributions over Q-values based on these. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation.


Machine Learning and Sensor Fusion for Estimating Continuous Energy Expenditure

AI Magazine

In this article we provide insight into the BodyMedia FIT armband system -- a wearable multi-sensor technology that continuously monitors physiological events related to energy expenditure for weight management using machine learning and data modeling methods. Since becoming commercially available in 2001, more than half a million users have used the system to track their physiological parameters and to achieve their individual health goals including weight-loss. We describe several challenges that arise in applying machine learning techniques to the health care domain and present various solutions utilized in the armband system. We demonstrate how machine learning and multi-sensor data fusion techniques are critical to the system's success.


Machine Learning and Sensor Fusion for Estimating Continuous Energy Expenditure

AI Magazine

In this article we provide insight into the BodyMedia FIT armband system — a wearable multi-sensor technology that continuously monitors physiological events related to energy expenditure for weight management using machine learning and data modeling methods. Since becoming commercially available in 2001, more than half a million users have used the system to track their physiological parameters and to achieve their individual health goals including weight-loss. We describe several challenges that arise in applying machine learning techniques to the health care domain and present various solutions utilized in the armband system. We demonstrate how machine learning and multi-sensor data fusion techniques are critical to the system’s success.


A compact, hierarchical Q-function decomposition

arXiv.org Artificial Intelligence

Previous work in hierarchical reinforcement learning has faced a dilemma: either ignore the values of different possible exit states from a subroutine, thereby risking suboptimal behavior, or represent those values explicitly thereby incurring a possibly large representation cost because exit values refer to nonlocal aspects of the world (i.e., all subsequent rewards). This paper shows that, in many cases, one can avoid both of these problems. The solution is based on recursively decomposing the exit value function in terms of Q-functions at higher levels of the hierarchy. This leads to an intuitively appealing runtime architecture in which a parent subroutine passes to its child a value function on the exit states and the child reasons about how its choices affect the exit value. We also identify structural conditions on the value function and transition distributions that allow much more concise representations of exit state distributions, leading to further state abstraction. In essence, the only variables whose exit values need be considered are those that the parent cares about and the child affects. We demonstrate the utility of our algorithms on a series of increasingly complex environments.


Machine Learning and Sensor Fusion for Estimating Continuous Energy Expenditure

AAAI Conferences

In this paper we provide insight into the BodyMedia FIT® armband system — a wearable multi-sensor technology that achieves the goals of continuous physiological monitoring (especially energy expenditure estimation) and weight management using machine learning and data modeling methods. This system has been commercially available since 2001 and more than half a million users have used the system to track their physiological parameters and to achieve their individual health goals including weight-loss. We describe several challenges that arise in applying machine learning techniques to the health care domain and present various solutions utilized in the armband system. We demonstrate how machine learning and multi-sensor data fusion techniques are critical to the system’s succ


Staff Scheduling for Inbound Call and Customer Contact Centers

AI Magazine

The staff scheduling problem is a critical problem in the call center (or, more generally, customer contact center) industry. This article describes DIRECTOR, a staff scheduling system for contact centers. DIRECTOR is a constraint-based system that uses AI search techniques to generate schedules that satisfy and optimize a wide range of constraints and service-quality metrics. DIRECTOR has successfully been deployed at more than 800 contact centers, with significant measurable benefits, some of which are documented in case studies included in this article.


Staff Scheduling for Inbound Call and Customer Contact Centers

AI Magazine

The staff scheduling problem is a critical problem in the call center (or, more generally, customer contact center) industry. This article describes DIRECTOR, a staff scheduling system for contact centers. DIRECTOR is a constraint-based system that uses AI search techniques to generate schedules that satisfy and optimize a wide range of constraints and service-quality metrics. DIRECTOR has successfully been deployed at more than 800 contact centers, with significant measurable benefits, some of which are documented in case studies included in this article.


Programmable Reinforcement Learning Agents

Neural Information Processing Systems

We present an expressive agent design language for reinforcement learning thatallows the user to constrain the policies considered by the learning process.Thelanguage includes standard features such as parameterized subroutines,temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning that which isn't specified, we present provably convergent learning algorithms. Wedemonstrate by example that agent programs written in the language are concise as well as modular. This facilitates state abstraction and the transferability of learned skills. 1 Introduction The field of reinforcement learning has recently adopted the idea that the application of prior knowledge may allow much faster learning and may indeed be essential if realworld environmentsare to be addressed. For learning behaviors, the most obvious form of prior knowledge provides a partial description of desired behaviors. Several languages for partial descriptions have been proposed, including Hierarchical Abstract Machines (HAMs) [8], semi-Markov options [12], and the MAXQ framework [4]. This paper describes extensions to the HAM language that substantially increase its expressive power,using constructs borrowed from programming languages. Obviously, increasing expressivenessmakes it easier for the user to supply whatever prior knowledge is available, and to do so more concisely.


Generalized Prioritized Sweeping

Neural Information Processing Systems

Prioritized sweeping is a model-based reinforcement learning method that attempts to focus an agent's limited computational resources to achieve a good estimate of the value of environment states. To choose effectively whereto spend a costly planning step, classic prioritized sweeping uses a simple heuristic to focus computation on the states that are likely to have the largest errors. In this paper, we introduce generalized prioritized sweeping, a principled method for generating such estimates in a representation-specific manner. This allows us to extend prioritized sweeping beyond an explicit, state-based representation to deal with compact representationsthat are necessary for dealing with large state spaces. We apply this method for generalized model approximators (such as Bayesian networks), and describe preliminary experiments that compare our approach with classical prioritized sweeping.