Plotting

 North America


Collective Inference on Markov Models for Modeling Bird Migration

Neural Information Processing Systems

We investigate a family of inference problems on Markov models, where many sample paths are drawn from a Markov chain and partial information is revealed to an observer who attempts to reconstruct the sample paths. We present algorithms andhardness results for several variants of this problem which arise by revealing differentinformation to the observer and imposing different requirements for the reconstruction of sample paths. Our algorithms are analogous to the classical Viterbialgorithm for Hidden Markov Models, which finds the single most probable sample path given a sequence of observations. Our work is motivated by an important application in ecology: inferring bird migration paths from a large database of observations.


Boosting Algorithms for Maximizing the Soft Margin

Neural Information Processing Systems

Gunnar Rätsch Friedrich Miescher Laboratory Max Planck Society Tübingen, Germany We present a novel boosting algorithm, called SoftBoost, designed for sets of binary labeledexamples that are not necessarily separable by convex combinations of base hypotheses. Our algorithm achieves robustness by capping the distributions onthe examples. Our update of the distribution is motivated by minimizing a relative entropy subject to the capping constraints and constraints on the edges of the obtained base hypotheses. The capping constraints imply a soft margin in the dual optimization problem. Our algorithm produces a convex combination of hypotheses whose soft margin is within δ of its maximum.


Discriminative Batch Mode Active Learning

Neural Information Processing Systems

Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance at one time while retraining in each iteration. However, single instance selection systems are unable to exploit a parallelized labeler when one is available. Recently a few batch mode active learning approaches have been proposed that select a set of most informative unlabeled instances in each iteration, guided by some heuristic scores. In this paper, we propose a discriminative batch mode active learning approach that formulates the instance selection task as a continuous optimization problem over auxiliary instance selection variables. The optimization is formuated to maximize the discriminative classification performance of the target classifier, while also taking the unlabeled data into account. Although the objective is not convex, we can manipulate a quasi-Newton method to obtain a good local solution. Our empirical studies on UCI datasets show that the proposed active learning is more effective than current state-of-the art batch mode active learning algorithms.


Efficient Principled Learning of Thin Junction Trees

Neural Information Processing Systems

We present the first truly polynomial algorithm for learning the structure of bounded-treewidth junction trees -- an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efficient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity, and provides strong theoretical guarantees in terms of $KL$ divergence from the true distribution. We also present a lazy extension of our approach that leads to very significant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of random variables with only a polynomial number of mutual information computations on fixed-size subsets of variables, when the underlying distribution can be approximated by a bounded treewidth junction tree.


Learning with Transformation Invariant Kernels

Neural Information Processing Systems

This paper considers kernels invariant to translation, rotation and dilation. We show that no nontrivial positive definite (p.d.) kernels exist which are radial and dilation invariant, only conditionally positive definite (c.p.d.) ones. Accordingly, we discuss the c.p.d.


An in-silico Neural Model of Dynamic Routing through Neuronal Coherence

Neural Information Processing Systems

We describe a neurobiologically plausible model to implement dynamic routing using the concept of neuronal communication through neuronal coherence. The model has a three-tier architecture: a raw input tier, a routing control tier, and an invariant output tier. The correct mapping between input and output tiers is realized byan appropriate alignment of the phases of their respective background oscillations by the routing control units. We present an example architecture, implemented ona neuromorphic chip, that is able to achieve circular-shift invariance.


CPR for CSPs: A Probabilistic Relaxation of Constraint Propagation

Neural Information Processing Systems

This paper proposes constraint propagation relaxation (CPR), a probabilistic approach to classical constraint propagation that provides another view on the whole parametric family of survey propagation algorithms SP(ρ), ranging from belief propagation (ρ = 0) to (pure) survey propagation(ρ = 1). More importantly, the approach elucidates the implicit, but fundamental assumptions underlying SP(ρ), thus shedding some light on its effectiveness and leading to applications beyond k-SAT.


GRIFT: A graphical model for inferring visual classification features from human data

Neural Information Processing Systems

This paper describes a new model for human visual classification that enables the recovery of image features that explain human subjects' performance on different visual classification tasks. Unlike previous methods, this algorithm does not model their performance with a single linear classifier operating on raw image pixels. Instead, it models classification as the combination of multiple feature detectors. This approach extracts more information about human visual classification than has been previously possible with other methods and provides a foundation for further exploration.


Classification via Minimum Incremental Coding Length (MICL)

Neural Information Processing Systems

We present a simple new criterion for classification, based on principles from lossy data compression. The criterion assigns a test sample to the class that uses the minimum numberof additional bits to code the test sample, subject to an allowable distortion. We prove asymptotic optimality of this criterion for Gaussian data and analyze its relationships to classical classifiers. Theoretical results provide new insights into relationships among popular classifiers such as MAP and RDA, as well as unsupervised clustering methods based on lossy compression [13]. Minimizing thelossy coding length induces a regularization effect which stabilizes the (implicit) density estimate in a small-sample setting. Compression also provides a uniform means of handling classes of varying dimension. This simple classification criterionand its kernel and local versions perform competitively against existing classifiers on both synthetic examples and real imagery data such as handwritten digitsand human faces, without requiring domain-specific information.


Kernel Measures of Conditional Dependence

Neural Information Processing Systems

We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend onthe choice of kernel in the limit of infinite data, for a wide class of kernels. Atthe same time, it has a straightforward empirical estimate with good convergence behaviour. We discuss the theoretical properties of the measure, and demonstrate its application in experiments.