Goto

Collaborating Authors

 Country



Cobot: A Social Reinforcement Learning Agent

Neural Information Processing Systems

We report on the use of reinforcement learning with Cobot, a software agent residing in the well-known online community LambdaMOO. Our initial work on Cobot (Isbell et al.2000) provided him with the ability to collect social statistics and report them to users. Here we describe an application of RL allowing Cobot to take proactive actions in this complex social environment, and adapt behavior from multiple sources of human reward. After 5 months of training, and 3171 reward and punishment events from 254 different LambdaMOO users, Cobot learned nontrivial preferences for a number of users, modifing his behavior based on his current state. Here we describe LambdaMOO and the state and action spaces of Cobot, and report the statistical results of the learning experiment.


Using Vocabulary Knowledge in Bayesian Multinomial Estimation

Neural Information Processing Systems

Recent approaches have used uncertainty over the vocabulary of symbols in a multinomial distribution as a means of accounting for sparsity. We present a Bayesian approach that allows weak prior knowledge, in the form of a small set of approximate candidate vocabularies, to be used to dramatically improve the resulting estimates. We demonstrate these improvements in applications to text compression and estimating distributions over words in newsgroup data.


Improvisation and Learning

Neural Information Processing Systems

This article presents a 2-phase computational learning model and application. As a demonstration, a system has been built, called CHIME for Computer Human Interacting Musical Entity. In phase 1 of training, recurrent back-propagation trains the machine to reproduce 3 jazz melodies. The recurrent network is expanded and is further trained in phase 2 with a reinforcement learning algorithm and a critique produced by a set of basic rules for jazz improvisation.


Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference

Neural Information Processing Systems

This conditional expected claim amount is called the pure premium and it is the basis of the gross premium charged to the insured. This expected value is conditionned on information available about the insured and about the contract, which we call input profile here. This regression problem is difficult for several reasons: large number of examples, -large number variables (most of which are discrete and multi-valued), non-stationarity of the distribution, and a conditional distribution of the dependent variable which is very different from those usually encountered in typical applications.of


Tempo tracking and rhythm quantization by sequential Monte Carlo

Neural Information Processing Systems

We present a probabilistic generative model for timing deviations in expressive music. The structure of the proposed model is equivalent to a switching state space model. We formulate two well known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering and maximum a posteriori (MAP) state estimation tasks. The inferences are carried out using sequential Monte Carlo integration (particle filtering) techniques. For this purpose, we have derived a novel Viterbi algorithm for Rao-Blackwellized particle filters, where a subset of the hidden variables is integrated out.


Bayesian Predictive Profiles With Applications to Retail Transaction Data

Neural Information Processing Systems

Massive transaction data sets are recorded in a routine manner in telecommunications, retail commerce, and Web site management. In this paper we address the problem of inferring predictive individual profiles from such historical transaction data. We describe a generative mixture model for count data and use an an approximate Bayesian estimation framework that effectively combines an individual's specific history with more general population patterns. We use a large real-world retail transaction data set to illustrate how these profiles consistently outperform non-mixture and non-Bayesian techniques in predicting customer behavior in out-of-sample data.


Model Based Population Tracking and Automatic Detection of Distribution Changes

Neural Information Processing Systems

Probabilistic mixture models are used for a broad range of data analysis tasks such as clustering, classification, predictive modeling, etc. Due to their inherent probabilistic nature, mixture models can easily be combined with other probabilistic or non-probabilistic techniques thus forming more complex data analysis systems. In the case of online data (where there is a stream of data available) models can be constantly updated to reflect the most current distribution of the incoming data. However, in many business applications the models themselves represent a parsimonious summary of the data and therefore it is not desirable to change models frequently, much less with every new data point. In such a framework it becomes crucial to track the applicability of the mixture model and detect the point in time when the model fails to adequately represent the data. In this paper we formulate the problem of change detection and propose a principled solution. Empirical results over both synthetic and real-life data sets are presented.


Grouping with Bias

Neural Information Processing Systems

In image segmentation, it means finding objects or object segments by clustering pixels and segregating them from background. It is often considered a bottom-up process. Although never explicitly stated, higher level of knowledge, such as familiar object shapes, is to be used only in a separate post-processing step. The need for the integration of prior knowledge arises in a number of applications. In computer vision, we would like image segmentation to correspond directly to object segmentation.


Switch Packet Arbitration via Queue-Learning

Neural Information Processing Systems

In packet switches, packets queue at switch inputs and contend for outputs. The contention arbitration policy directly affects switch performance. The best policy depends on the current state of the switch and current traffic patterns. This problem is hard because the state space, possible transitions, and set of actions all grow exponentially with the size of the switch. We present a reinforcement learning formulation of the problem that decomposes the value function into many small independent value functions and enables an efficient action selection.