Technology
The Interplay of Symbolic and Subsymbolic Processes in Anagram Problem Solving
Grimes, David B., Mozer, Michael C.
Although connectionist models have provided insights into the nature of perception and motor control, connectionist accounts of higher cognition seldom go beyond an implementation of traditional symbol-processing theories. We describe a connectionist constraint satisfaction model of how people solve anagram problems. The model exploits statistics of English orthography, but also addresses the interplay of sub symbolic and symbolic computation by a mechanism that extracts approximate symbolic representations (partial orderings of letters) from sub symbolic structures and injects the extracted representation back into the model to assist in the solution of the anagram. We show the computational benefit of this extraction-injection process and discuss its relationship to conscious mental processes and working memory. We also account for experimental data concerning the difficulty of anagram solution based on the orthographic structure of the anagram string and the target word.
A Productive, Systematic Framework for the Representation of Visual Structure
Edelman, Shimon, Intrator, Nathan
For example, priming in a subliminal perception task was found to be confined to a quadrant of the visual field [16]. The notion that the representation of an object may be tied to a particular location in the visual field where it is first observed is compatible with the concept of object file, a hypothetical record created by the visual system for every encountered object, which persists as long as the object is observed. Moreover, location (as it figures in the CoF model) should be interpreted relative to the focus of attention, rather than retinotopically [17]. The idea that global relationships (hence, large-scale structure) have precedence over local ones [18], which is central to our approach, has withstood extensive testing in the past two decades. Even with the perceptual salience of the global and local structure equated, subjects are able to process the relations among elements before the elements themselves are identified [19]. More generally, humans are limited in their ability to represent spatial structure, in that the representation of spatial relations requires spatial attention.
Who Does What? A Novel Algorithm to Determine Function Localization
Aharonov-Barki, Ranit, Meilijson, Isaac, Ruppin, Eytan
We introduce a novel algorithm, termed PPA (Performance Prediction Algorithm), that quantitatively measures the contributions of elements of a neural system to the tasks it performs. The algorithm identifies the neurons or areas which participate in a cognitive or behavioral task, given data about performance decrease in a small set of lesions. It also allows the accurate prediction of performances due to multi-element lesions. The effectiveness of the new algorithm is demonstrated in two models of recurrent neural networks with complex interactions among the elements. The algorithm is scalable and applicable to the analysis of large neural networks. Given the recent advances in reversible inactivation techniques, it has the potential to significantly contribute to the understanding of the organization of biological nervous systems, and to shed light on the long-lasting debate about local versus distributed computation in the brain.
From Mixtures of Mixtures to Adaptive Transform Coding
Archer, Cynthia, Leen, Todd K.
We establish a principled framework for adaptive transform coding. Transformcoders are often constructed by concatenating an ad hoc choice of transform with suboptimal bit allocation and quantizer design.Instead, we start from a probabilistic latent variable model in the form of a mixture of constrained Gaussian mixtures. From this model we derive a transform coding algorithm, which is a constrained version of the generalized Lloyd algorithm for vector quantizer design. A byproduct of our derivation is the introduction ofa new transform basis, which unlike other transforms (PCA, DCT, etc.) is explicitly optimized for coding. Image compression experiments show adaptive transform coders designed with our algorithm improvecompressed image signal-to-noise ratio up to 3 dB compared to global transform coding and 0.5 to 2 dB compared to other adaptive transform coders. 1 Introduction Compression algorithms for image and video signals often use transform coding as a low-complexity alternative to vector quantization (VQ).
Generalizable Singular Value Decomposition for Ill-posed Datasets
Kjems, Ulrik, Hansen, Lars Kai, Strother, Stephen C.
Becausethe training examples in an ill-posed data set do not fully span the signal space the observed training set variances in each basis vector will be too high compared to the average variance ofthe test set projections onto the same basis vectors. On basis of this understanding we introduce the Generalizable Singular ValueDecomposition (GenSVD) as a means to reduce this bias by re-estimation of the singular values obtained in a conventional Singular Value Decomposition, allowing for a generalization performance increaseof a subsequent statistical model. We demonstrate that the algorithm succesfully corrects bias in a data set from a functional PET activation study of the human brain. 1 Ill-posed Data Sets An ill-posed data set has more dimensions in each example than there are examples. Such data sets occur in many fields of research typically in connection with image measurements. The associated statistical problem is that of extracting structure from the observed high-dimensional vectors in the presence of noise. The statistical analysis can be done either supervised (Le.
Hierarchical Memory-Based Reinforcement Learning
Hernandez-Gardiol, Natalia, Mahadevan, Sridhar
A key challenge for reinforcement learning is scaling up to large partially observable domains. In this paper, we show how a hierarchy ofbehaviors can be used to create and select among variable length short-term memories appropriate for a task. At higher levels inthe hierarchy, the agent abstracts over lower-level details and looks back over a variable number of high-level decisions in time. We formalize this idea in a framework called Hierarchical Suffix Memory (HSM). HSM uses a memory-based SMDP learning method to rapidly propagate delayed reward across long decision sequences.
Algorithms for Non-negative Matrix Factorization
Lee, Daniel D., Seung, H. Sebastian
Nonnegative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multiplicative algorithmsfor NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence. The monotonic convergence of both algorithms can be proven using an auxiliary function analogousto that used for proving convergence of the Expectation Maximization algorithm. The algorithms can also be interpreted as diagonally rescaledgradient descent, where the rescaling factor is optimally chosen to ensure convergence.
The Kernel Gibbs Sampler
Graepel, Thore, Herbrich, Ralf
We present an algorithm that samples the hypothesis space of kernel classifiers.Given a uniform prior over normalised weight vectors and a likelihood based on a model of label noise leads to a piecewise constantposterior that can be sampled by the kernel Gibbs sampler (KGS). The KGS is a Markov Chain Monte Carlo method that chooses a random direction in parameter space and samples from the resulting piecewise constant density along the line chosen. The KGS can be used as an analytical tool for the exploration of Bayesian transduction, Bayes point machines, active learning, and evidence-based model selection on small data sets that are contaminated withlabel noise. For a simple toy example we demonstrate experimentally how a Bayes point machine based on the KGS outperforms anSVM that is incapable of taking into account label noise. 1 Introduction Two great ideas have dominated recent developments in machine learning: the application ofkernel methods and the popularisation of Bayesian inference. Focusing on the task of classification, various connections between the two areas exist: kernels havelong been a part of Bayesian inference in the disguise of covariance nmctions thatcharacterise priors over functions [9].
Sparse Representation for Gaussian Process Models
We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm togetherwith a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental resultson toy examples and large real-world datasets indicate the efficiency of the approach.