Goto

Collaborating Authors

 Country



Bayesian Predictive Profiles With Applications to Retail Transaction Data

Neural Information Processing Systems

Massive transaction data sets are recorded in a routine manner in telecommunications, retail commerce, and Web site management. In this paper we address the problem of inferring predictive individual profilesfrom such historical transaction data. We describe a generative mixture model for count data and use an an approximate Bayesian estimation framework that effectively combines anindividual's specific history with more general population patterns. We use a large real-world retail transaction data set to illustrate how these profiles consistently outperform non-mixture and non-Bayesian techniques in predicting customer behavior in out-of-sample data.


Agglomerative Multivariate Information Bottleneck

Neural Information Processing Systems

The information bottleneck method is an unsupervised model independent data organization technique. Given a joint distribution peA, B), this method constructs anew variable T that extracts partitions, or clusters, over the values of A that are informative about B. In a recent paper, we introduced a general principled frameworkfor multivariate extensions of the information bottleneck method that allows us to consider multiple systems of data partitions that are interrelated. In this paper, we present a new family of simple agglomerative algorithms to construct such systems of interrelated clusters. We analyze the behavior of these algorithms and apply them to several real-life datasets.


Grammatical Bigrams

Neural Information Processing Systems

Unsupervised learning algorithms have been derived for several statistical modelsof English grammar, but their computational complexity makesapplying them to large data sets intractable. This paper presents a probabilistic model of English grammar that is much simpler than conventional models, but which admits an efficient EMtraining algorithm. The model is based upon grammatical bigrams, i.e., syntactic relationships between pairs of words. We present the results of experiments that quantify the representational adequacyof the grammatical bigram model, its ability to generalize from labelled data, and its ability to induce syntactic structure from large amounts of raw text. 1 Introduction One of the most significant challenges in learning grammars from raw text is keeping thecomputational complexity manageable. For example, the EM algorithm for the unsupervised training of Probabilistic Context-Free Grammars-known as the Inside-Outside algorithm-has been found in practice to be "computationally intractable for realistic problems" [1].


Stabilizing Value Function Approximation with the BFBP Algorithm

Neural Information Processing Systems

Our BFBP (Batch Fit to Best Paths) algorithm alternates between an exploration phase (during which trajectories are generated to try to find fragments of the optimal policy) and a function fitting phase (during which a function approximator is fit to the best known paths from start states to terminal states). An advantage of this approach is that batch value-function fitting is a global process, which allows it to address the tradeoffs in function approximation that cannot be handled by local, online algorithms.


The Steering Approach for Multi-Criteria Reinforcement Learning

Neural Information Processing Systems

We consider the problem of learning to attain multiple goals in a dynamic environment, whichis initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm combines, inan appropriate way, a finite set of standard, scalar-reward learning algorithms. Sufficientconditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well.


Predictive Representations of State

Neural Information Processing Systems

We show that states of a dynamical system can be usefully represented bymulti-step, action-conditional predictions of future observations. Staterepresentations that are grounded in data in this way may be easier to learn, generalize better, and be less dependent onaccurate prior models than, for example, POMDP state representations. Building on prior work by Jaeger and by Rivest and Schapire, in this paper we compare and contrast a linear specialization ofthe predictive approach with the state representations used in POMDPs and in k-order Markov models. Ours is the first specific formulation of the predictive idea that includes both stochasticity and actions (controls). We show that any system has a linear predictive state representation with number of predictions no greater than the number of states in its minimal POMDP model.


Model-Free Least-Squares Policy Iteration

Neural Information Processing Systems

We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. We are motivated by the least squares temporal difference learning algorithm (LSTD), which is known for its efficient use of sample experiences compared to pure temporal difference algorithms. LSTD is ideal for prediction problems, however it heretofore has not had a straightforward application to control problems. Moreover, approximations learned by LSTD are strongly influenced by the visitation distribution over states.


Incremental A*

Neural Information Processing Systems

Incremental search techniques find optimal solutions to series of similar search tasks much faster than is possible by solving each search task from scratch. While researchers have developed incremental versions of uninformed search methods, we develop an incremental version of A*. The first search of Lifelong Planning A* is the same as that of A* but all subsequent searches are much faster because it reuses those parts of the previous search tree that are identical to the new search tree. We then present experimental results that demonstrate the advantages of Lifelong Planning A* for simple route planning tasks.


A Natural Policy Gradient

Neural Information Processing Systems

Sham Kakade Gatsby Computational Neuroscience Unit 17 Queen Square, London, UK WC1N 3AR http://www.gatsby.ucl.ac.uk sham@gatsby.ucl.ac.uk Abstract We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space.Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient ismoving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton etal. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1 Introduction There has been a growing interest in direct policy-gradient methods for approximate planning in large Markov decision problems (MDPs). Unfortunately, the standard gradient descent rule is noncovariant. In this paper, we present a covariant gradient by defining a metric based on the underlying structure of the policy.