Country
Structure Learning in Human Causal Induction
Tenenbaum, Joshua B., Griffiths, Thomas L.
We use graphical models to explore the question of how people learn simple causalrelationships from data. The two leading psychological theories canboth be seen as estimating the parameters of a fixed graph. We argue that a complete account of causal induction should also consider how people learn the underlying causal graph structure, and we propose to model this inductive process as a Bayesian inference. Our argument is supported through the discussion of three data sets. 1 Introduction Causality plays a central role in human mental life. Our behavior depends upon our understanding ofthe causal structure of our environment, and we are remarkably good at inferring causation from mere observation. Constructing formal models of causal induction is currently a major focus of attention in computer science [7], psychology [3,6], and philosophy [5].This paper attempts to connect these literatures, by framing the debate between two major psychological theories in the computational language of graphical models. We show that existing theories equate human causal induction with maximum likelihood parameter estimationon a fixed graphical structure, and we argue that to fully account for human behavioral data, we must also postulate that people make Bayesian inferences about the underlying causal graph structure itself.
Decomposition of Reinforcement Learning for Admission Control of Self-Similar Call Arrival Processes
In multi-service communications networks, such as Asynchronous Transfer Mode (ATM) networks, resource control is of crucial importance for the network operator as well as for the users. The objective is to maintain the service quality while maximizing the operator's revenue. At the call level, service quality (Grade of Service) is measured in terms of call blocking probabilities, and the key resource to be controlled is bandwidth. Network routing and call admission control (CAC) are two such resource control problems. Markov decision processes offer a framework for optimal CAC and routing [1]. By modelling thedynamics of the network with traffic and computing control policies using dynamic programming [2], resource control is optimized. A standard assumption in such models is that calls arrive according to Poisson processes. This makes the models of the dynamics relatively simple. Although the Poisson assumption is valid for most user-initiated requests in communications networks, a number of studies [3, 4, 5] indicate that many types of arrival processesin wide-area networks as well as in local area networks are statistically selfsimilar.
Hierarchical Memory-Based Reinforcement Learning
Hernandez-Gardiol, Natalia, Mahadevan, Sridhar
A key challenge for reinforcement learning is scaling up to large partially observable domains. In this paper, we show how a hierarchy ofbehaviors can be used to create and select among variable length short-term memories appropriate for a task. At higher levels inthe hierarchy, the agent abstracts over lower-level details and looks back over a variable number of high-level decisions in time. We formalize this idea in a framework called Hierarchical Suffix Memory (HSM). HSM uses a memory-based SMDP learning method to rapidly propagate delayed reward across long decision sequences.
Hippocampally-Dependent Consolidation in a Hierarchical Model of Neocortex
In memory consolidation, declarative memories which initially require the hippocampus for their recall, ultimately become independent of it. Consolidation has been the focus of numerous experimental and qualitative modelingstudies, but only little quantitative exploration. We present a consolidation model in which hierarchical connections in the cortex, that initially instantiate purely semantic information acquired through probabilistic unsupervised learning, come to instantiate episodic information aswell. The hippocampus is responsible for helping complete partial input patterns before consolidation is complete, while also training thecortex to perform appropriate completion by itself.
Learning Sparse Image Codes using a Wavelet Pyramid Architecture
Olshausen, Bruno A., Sallee, Phil, Lewicki, Michael S.
We show how a wavelet basis may be adapted to best represent natural images in terms of sparse coefficients. The wavelet basis, which may be either complete or overcomplete, is specified by a small number of spatial functions which are repeated across space and combined in a recursive fashion so as to be self-similar across scale. These functions are adapted to minimize the estimated code length under a model that assumes images are composed of a linear superposition of sparse, independent components. When adapted to natural images, the wavelet bases take on different orientations and they evenly tile the orientation domain, in stark contrast to the standard, non-oriented wavelet bases used in image compression. When the basis set is allowed to be overcomplete, it also yields higher coding efficiency than standard wavelet bases. 1 Introduction The general problem we address here is that of learning efficient codes for representing naturalimages.
On a Connection between Kernel PCA and Metric Multidimensional Scaling
This leads to a metric MDS algorithm where the desired configuration of points is found via the solution of an eigenproblem rather than through the iterative optimization of the stress objective function. The question of kernel choice is also discussed. 1 Introduction Suppose we are given n objects, and for each pair (i,j) we have a measurement of the "dissimilarity" Oij between the two objects. In multidimensional scaling (MDS) the aim is to place n points in a low dimensional space (usually Euclidean) so that the interpoint distances dij have a particular relationship to the original dissimilarities. In classical scaling we would like the interpoint distances to be equal to the dissimilarities. For example, classical scaling can be used to reconstruct a map of the locations of some cities given the distances between them.
Foundations for a Circuit Complexity Theory of Sensory Processing
Legenstein, Robert A., Maass, Wolfgang
We introduce total wire length as salient complexity measure for an analysis ofthe circuit complexity of sensory processing in biological neural systems and neuromorphic engineering. This new complexity measure is applied to a set of basic computational problems that apparently need to be solved by circuits for translation-and scale-invariant sensory processing. Weexhibit new circuit design strategies for these new benchmark functions that can be implemented within realistic complexity bounds, in particular with linear or almost linear total wire length. 1 Introduction Circuit complexity theory is a classical area of theoretical computer science, that provides estimates for the complexity of circuits for computing specific benchmark functions, such as binary addition, multiplication and sorting (see, e.g.
Algebraic Information Geometry for Learning Machines with Singularities
Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold, since Fisher information matricesare singular. In this paper, the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two different problems are studied.
The Interplay of Symbolic and Subsymbolic Processes in Anagram Problem Solving
Grimes, David B., Mozer, Michael C.
Although connectionist models have provided insights into the nature of perception and motor control, connectionist accounts of higher cognition seldom go beyond an implementation of traditional symbol-processing theories. We describe a connectionist constraint satisfaction model of how people solve anagram problems. The model exploits statistics of English orthography, but also addresses the interplay of subsymbolic and symbolic computation by a mechanism that extracts approximate symbolic representations (partial orderings of letters) from subsymbolic structures and injects the extracted representation back into the model to assist in the solution of the anagram. We show the computational benefit of this extraction-injection process and discuss its relationship to conscious mental processes and working memory. We also account for experimental data concerning the difficulty of anagram solution based on the orthographic structure of the anagram string and the target word.