Goto

Collaborating Authors

 Country


Causal Categorization with Bayes Nets

Neural Information Processing Systems

A theory of categorization is presented in which knowledge of causal relationships between category features is represented as a Bayesian network. Referred to as causal-model theory, this theory predicts that objects are classified as category members to the extent they are likely to have been produced by a categorys causal model. On this view, people have models of the world that lead them to expect a certain distribution of features in category members (e.g., correlations between feature pairs that are directly connected by causal relationships), and consider exemplars good category members when they manifest those expectations. These expectations include sensitivity to higher-order feature interactions that emerge from the asymmetries inherent in causal relationships. Research on the topic of categorization has traditionally focused on the problem of learning new categories given observations of category members.


Grammatical Bigrams

Neural Information Processing Systems

Unsupervised learning algorithms have been derived for several statistical models of English grammar, but their computational complexity makes applying them to large data sets intractable. This paper presents a probabilistic model of English grammar that is much simpler than conventional models, but which admits an efficient EM training algorithm. The model is based upon grammatical bigrams, i.e., syntactic relationships between pairs of words. We present the results of experiments that quantify the representational adequacy of the grammatical bigram model, its ability to generalize from labelled data, and its ability to induce syntactic structure from large amounts of raw text. 1 Introduction One of the most significant challenges in learning grammars from raw text is keeping the computational complexity manageable. For example, the EM algorithm for the unsupervised training of Probabilistic Context-Free Grammars-known as the Inside-Outside algorithm-has been found in practice to be "computationally intractable for realistic problems" [1].


A Model of the Phonological Loop: Generalization and Binding

Neural Information Processing Systems

We present a neural network model that shows how the prefrontal cortex, interacting with the basal ganglia, can maintain a sequence of phonological information in activation-based working memory (i.e., the phonological loop). The primary function of this phonological loop may be to transiently encode arbitrary bindings of information necessary for tasks - the combinatorial expressive power of language enables very flexible binding of essentially arbitrary pieces of information. Our model takes advantage of the closed-class nature of phonemes, which allows different neural representations of all possible phonemes at each sequential position to be encoded. To make this work, we suggest that the basal ganglia provide a region-specific update signal that allocates phonemes to the appropriate sequential coding slot. To demonstrate that flexible, arbitrary binding of novel sequences can be supported by this mechanism, we show that the model can generalize to novel sequences after moderate amounts of training.


Generalizable Relational Binding from Coarse-coded Distributed Representations

Neural Information Processing Systems

We present a model of binding of relationship information in a spatial domain (e.g., square above triangle) that uses low-order coarse-coded conjunctive representations instead of more popular temporal synchrony mechanisms. Supporters of temporal synchrony argue that conjunctive representations lack both efficiency (i.e., combinatorial numbers of units are required) and systematicity (i.e., the resulting representations are overly specific and thus do not support generalization to novel exemplars). To counter these claims, we show that our model: a) uses far fewer hidden units than the number of conjunctions represented, by using coarse-coded, distributed representations where each unit has a broad tuning curve through high-dimensional conjunction space, and b) is capable of considerable generalization to novel inputs.


Grammar Transfer in a Second Order Recurrent Neural Network

Neural Information Processing Systems

Furthermore, this effect persists even when the new strings violate the syntactic rule slightly as long as they are similar to the old strings [1]. It has been shown in the past studies that recurrent neural networks also have the ability to generalize previously acquired knowledge to novel inputs. For instance, Dienes et al. ([2]) showed that a neural network can generalize abstract knowledge acquired in one domain to a new domain. They trained the network to predict the next input symbol in grammatical sequences in the first domain, and showed that the network was able to learn to predict grammatical sequences in the second domain more effectively than it would have learned them without the prior learning. During the training in the second domain, they had to freeze the weights of a part of the network to prevent catastrophic forgetting. They used this simulation paradigm to emulate and analyze domain transfer, effect of similarity between training and test sequences, and the effect of n-gram information in human data. Hanson et al. ([5]) also showed that a prior learning of a grammar facilitates the learning of a new grammar in the cases where either the syntax or the vocabulary was kept constant. In this study we investigate grammar transfer by a neural network, where both syntax and vocabularies are different from the source grammar to the target grammar. Unlike Dienes et al.'s network, all weights in the network are allowed to change dur- ing the learning of the target grammar, which allows us to investigate interference as well as transfer from the source grammar to the target grammar.


A Bayesian Model Predicts Human Parse Preference and Reading Times in Sentence Processing

Neural Information Processing Systems

Narayanan and Jurafsky (1998) proposed that human language comprehension can be modeled by treating human comprehenders as Bayesian reasoners, and modeling the comprehension process with Bayesian decision trees. In this paper we extend the Narayanan and Jurafsky model to make further predictions about reading time given the probability of difference parses or interpretations, and test the model against reading time data from a psycholinguistic experiment.


A Rational Analysis of Cognitive Control in a Speeded Discrimination Task

Neural Information Processing Systems

We are interested in the mechanisms by which individuals monitor and adjust their performance of simple cognitive tasks. We model a speeded discrimination task in which individuals are asked to classify a sequence of stimuli (Jones & Braver, 2001). Response conflict arises when one stimulus class is infrequent relative to another, resulting in more errors and slower reaction times for the infrequent class. How do control processes modulate behavior based on the relative class frequencies? We explain performance from a rational perspective that casts the goal of individuals as minimizing a cost that depends both on error rate and reaction time.



Fragment Completion in Humans and Machines

Neural Information Processing Systems

Partial information can trigger a complete memory. At the same time, human memory is not perfect. A cue can contain enough information to specify an item in memory, but fail to trigger that item. In the context of word memory, we present experiments that demonstrate some basic patterns in human memory errors. We use cues that consist of word fragments. We show that short and long cues are completed more accurately than medium length ones and study some of the factors that lead to this behavior. We then present a novel computational model that shows some of the flexibility and patterns of errors that occur in human memory.


Probabilistic principles in unsupervised learning of visual structure: human data and a model

Neural Information Processing Systems

To find out how the representations of structured visual objects depend on the co-occurrence statistics of their constituents, we exposed subjects to a set of composite images with tight control exerted over (1) the conditional probabilities of the constituent fragments, and (2) the value of Barlow's criterion of "suspicious coincidence" (the ratio of joint probability to the product of marginals). We then compared the part verification response times for various probe/target combinations before and after the exposure. For composite probes, the speedup was much larger for targets that contained pairs of fragments perfectly predictive of each other, compared to those that did not. This effect was modulated by the significance of their co-occurrence as estimated by Barlow's criterion. For lone-fragment probes, the speedup in all conditions was generally lower than for composites. These results shed light on the brain's strategies for unsupervised acquisition of structural information in vision.