Goto

Collaborating Authors

 Europe


Reinforcement Learning and Time Perception -- a Model of Animal Experiments

Neural Information Processing Systems

Animal data on delayed-reward conditioning experiments shows a striking property - the data for different time intervals collapses into a single curve when the data is scaled by the time interval. This is called the scalar property of interval timing. Here a simple model of a neural clock is presented and shown to give rise to the scalar property. The model is an accumulator consisting of noisy, linear spiking neurons. It is analytically tractable and contains only three parameters.


Grammatical Bigrams

Neural Information Processing Systems

Unsupervised learning algorithms have been derived for several statistical models of English grammar, but their computational complexity makes applying them to large data sets intractable. This paper presents a probabilistic model of English grammar that is much simpler than conventional models, but which admits an efficient EM training algorithm. The model is based upon grammatical bigrams, i.e., syntactic relationships between pairs of words. We present the results of experiments that quantify the representational adequacy of the grammatical bigram model, its ability to generalize from labelled data, and its ability to induce syntactic structure from large amounts of raw text. 1 Introduction One of the most significant challenges in learning grammars from raw text is keeping the computational complexity manageable. For example, the EM algorithm for the unsupervised training of Probabilistic Context-Free Grammars-known as the Inside-Outside algorithm-has been found in practice to be "computationally intractable for realistic problems" [1].


A Model of the Phonological Loop: Generalization and Binding

Neural Information Processing Systems

We present a neural network model that shows how the prefrontal cortex, interacting with the basal ganglia, can maintain a sequence of phonological information in activation-based working memory (i.e., the phonological loop). The primary function of this phonological loop may be to transiently encode arbitrary bindings of information necessary for tasks - the combinatorial expressive power of language enables very flexible binding of essentially arbitrary pieces of information. Our model takes advantage of the closed-class nature of phonemes, which allows different neural representations of all possible phonemes at each sequential position to be encoded. To make this work, we suggest that the basal ganglia provide a region-specific update signal that allocates phonemes to the appropriate sequential coding slot. To demonstrate that flexible, arbitrary binding of novel sequences can be supported by this mechanism, we show that the model can generalize to novel sequences after moderate amounts of training.




Modeling Temporal Structure in Classical Conditioning

Neural Information Processing Systems

The Temporal Coding Hypothesis of Miller and colleagues [7] suggests that animals integrate related temporal patterns of stimuli into single memory representations. We formalize this concept using quasi-Bayes estimation to update the parameters of a constrained hidden Markov model. This approach allows us to account for some surprising temporal effects in the second order conditioning experiments of Miller et al. [1, 2, 3], which other models are unable to explain.


Model Based Population Tracking and Automatic Detection of Distribution Changes

Neural Information Processing Systems

Probabilistic mixture models are used for a broad range of data analysis tasks such as clustering, classification, predictive modeling, etc. Due to their inherent probabilistic nature, mixture models can easily be combined with other probabilistic or non-probabilistic techniques thus forming more complex data analysis systems. In the case of online data (where there is a stream of data available) models can be constantly updated to reflect the most current distribution of the incoming data. However, in many business applications the models themselves represent a parsimonious summary of the data and therefore it is not desirable to change models frequently, much less with every new data point. In such a framework it becomes crucial to track the applicability of the mixture model and detect the point in time when the model fails to adequately represent the data. In this paper we formulate the problem of change detection and propose a principled solution. Empirical results over both synthetic and real-life data sets are presented.



Spectral Relaxation for K-means Clustering

Neural Information Processing Systems

In K-means clusters are represented by centers of mass of their members, and it can be shown that the K-means algorithm of alternating between assigning cluster membership for each data vector to the nearest cluster center and computing the center of each cluster as the centroid of its member data vectors is equivalent to finding the minimum of a sum-of-squares cost function using coordinate descend. Despite the popularity of K means clustering, one of its major drawbacks is that the coordinate descend search method is prone to local minima. Much research has been done on computing refined initial points and adding explicit constraints to the sum-of-squares cost function for K-means clustering so that the search can converge to better local minimum [1,2]. In this paper we tackle the problem from a different angle: we find an equivalent formulation of the sum-of-squares minimization as a trace maximization problem with special constraints; relaxing the constraints leads to a maximization problem that possesses optimal global solutions. As a byproduct we also have an easily computable lower bound for the minimum of the sum-of-squares cost function. Our work is inspired by [9, 3] where connection to Gram matrix and extension of K means method to general Mercer kernels were investigated. The rest of the paper is organized as follows: in section 2, we derive the equivalent trace maximization formulation and discuss its spectral relaxation. In section 3, we discuss how to assign cluster membership using pivoted QR decomposition, taking into account the special structure of the partial eigenvector matrix. Finally, in section 4, we illustrate the performance of the clustering algorithms using document clustering as an example.


Learning Lateral Interactions for Feature Binding and Sensory Segmentation

Neural Information Processing Systems

We present a new approach to the supervised learning of lateral interactions forthe competitive layer model (CLM) dynamic feature binding architecture. The method is based on consistency conditions, which were recently shown to characterize the attractor states of this linear threshold recurrent network. For a given set of training examples the learning problem isformulated as a convex quadratic optimization problem in the lateral interaction weights. An efficient dimension reduction of the learning problem can be achieved by using a linear superposition of basis interactions.