Goto

Collaborating Authors

 Learning Graphical Models


PAC-Bayes Learning of Conjunctions and Classification of Gene-Expression Data

Neural Information Processing Systems

We propose a "soft greedy" learning algorithm for building small conjunctions of simple threshold functions, called rays, defined on single real-valued attributes. We also propose a PAC-Bayes risk bound which is minimized for classifiers achieving a nontrivial tradeoff between sparsity (the number of rays used) and the magnitude ofthe separating margin of each ray. Finally, we test the soft greedy algorithm on four DNA micro-array data sets.



Schema Learning: Experience-Based Construction of Predictive Action Models

Neural Information Processing Systems

Schema learning is a way to discover probabilistic, constructivist, predictive actionmodels (schemas) from experience. It includes methods for finding and using hidden state to make predictions more accurate. Weextend the original schema mechanism [1] to handle arbitrary discrete-valued sensors, improve the original learning criteria to handle POMDP domains, and better maintain hidden state by using schema predictions. Theseextensions show large improvement over the original schema mechanism in several rewardless POMDPs, and achieve very low prediction error in a difficult speech modeling task. Further, we compare extended schema learning to the recently introduced predictive state representations [2],and find their predictions of next-step action effects to be approximately equal in accuracy. This work lays the foundation for a schema-based system of integrated learning and planning.


A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

Neural Information Processing Systems

In this paper we propose a probabilistic model for online document clustering. Weuse nonparametric Dirichlet process prior to model the growing number of clusters, and use a prior of general English language model as the base distribution to handle the generation of novel clusters. Furthermore, cluster uncertainty is modeled with a Bayesian Dirichletmultinomial distribution.We use empirical Bayes method to estimate hyperparameters based on a historical dataset. Our probabilistic model is applied to the novelty detection task in Topic Detection and Tracking (TDT) and compared with existing approaches in the literature.





Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes

Neural Information Processing Systems

We propose the hierarchical Dirichlet process (HDP), a nonparametric Bayesian model for clustering problems involving multiple groups of data. Each group of data is modeled with a mixture, with the number of components being open-ended and inferred automatically by the model. Further, components can be shared across groups, allowing dependencies across groups to be modeled effectively as well as conferring generalization tonew groups. Such grouped clustering problems occur often in practice, e.g. in the problem of topic discovery in document corpora. We report experimental results on three text corpora showing the effective and superior performance of the HDP over previous models.


Constraining a Bayesian Model of Human Visual Speed Perception

Neural Information Processing Systems

It has been demonstrated that basic aspects of human visual motion perception arequalitatively consistent with a Bayesian estimation framework, where the prior probability distribution on velocity favors slow speeds. Here, we present a refined probabilistic model that can account for the typical trial-to-trial variabilities observed in psychophysical speed perception experiments. We also show that data from such experiments can be used to constrain both the likelihood and prior functions of the model. Specifically, we measured matching speeds and thresholds in a two-alternative forced choice speed discrimination task. Parametric fits to the data reveal that the likelihood function is well approximated by a LogNormal distribution with a characteristic contrast-dependent variance, andthat the prior distribution on velocity exhibits significantly heavier tails than a Gaussian, and approximately follows a power-law function.