Technology
Learning Hierarchical Structures with Linear Relational Embedding
Paccanaro, Alberto, Hinton, Geoffrey E.
We present Linear Relational Embedding (LRE), a new method of learning adistributed representation of concepts from data consisting of instances ofrelations between given concepts. Its final goal is to be able to generalize, i.e. infer new instances of these relations among the concepts. Ona task involving family relationships we show that LRE can generalize better than any previously published method. We then show how LRE can be used effectively to find compact distributed representations forvariable-sized recursive data structures, such as trees and lists.
The Unified Propagation and Scaling Algorithm
In this paper we will show that a restricted class of constrained minimum divergenceproblems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation anditerative scaling. This unified propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. Experiments show the viability of our algorithm.
Learning Spike-Based Correlations and Conditional Probabilities in Silicon
Shon, Aaron P., Hsu, David, Diorio, Chris
We have designed and fabricated a VLSI synapse that can learn a conditional probability or correlation between spike-based inputs and feedback signals. The synapse is low power, compact, provides nonvolatile weight storage, and can perform simultaneous multiplication andadaptation. We can calibrate arrays of synapses to ensure uniform adaptation characteristics. Finally, adaptation in our synapse does not necessarily depend on the signals used for computation. Consequently,our synapse can implement learning rules that correlate past and present synaptic activity. We provide analysis andexperimental chip results demonstrating the operation in learning and calibration mode, and show how to use our synapse to implement various learning rules in silicon.
Blind Source Separation via Multinode Sparse Representation
Zibulevsky, Michael, Kisilev, Pavel, Zeevi, Yehoshua Y., Pearlmutter, Barak A.
We consider a problem of blind source separation from a set of instantaneous linearmixtures, where the mixing matrix is unknown. It was discovered recently, that exploiting the sparsity of sources in an appropriate representationaccording to some signal dictionary, dramatically improves the quality of separation. In this work we use the property of multi scale transforms, such as wavelet or wavelet packets, to decompose signals into sets of local features with various degrees of sparsity. We use this intrinsic property for selecting the best (most sparse) subsets of features for further separation. The performance of the algorithm is verified onnoise-free and noisy data. Experiments with simulated signals, musical sounds and images demonstrate significant improvement of separation qualityover previously reported results. 1 Introduction
Reinforcement Learning with Long Short-Term Memory
This paper presents reinforcement learning with a Long Short Term Memory recurrent neural network: RL-LSTM. Model-free RL-LSTM using Advantage(,x) learning and directed exploration can solve non-Markovian tasks with long-term dependencies between relevantevents. This is demonstrated in a T-maze task, as well as in a difficult variation of the pole balancing task. 1 Introduction Reinforcement learning (RL) is a way of learning how to behave based on delayed reward signals [12]. Among the more important challenges for RL are tasks where part of the state of the environment is hidden from the agent. Such tasks are called non-Markovian tasks or Partially Observable Markov Decision Processes. Many real world tasks have this problem of hidden state. For instance, in a navigation task different positions in the environment may look the same, but one and the same action may lead to different next states or rewards. Thus, hidden state makes RL more realistic.
Relative Density Nets: A New Way to Combine Backpropagation with HMM's
Brown, Andrew D., Hinton, Geoffrey E.
Hinton Gatsby Unit, UCL London, UK WCIN 3AR hinton@gatsby.ucl.ac.uk Abstract Logistic units in the first hidden layer of a feedforward neural network computethe relative probability of a data point under two Gaussians. This leads us to consider substituting other density models. We present an architecture for performing discriminative learning of Hidden Markov Models using a network of many small HMM's. Experiments on speech data show it to be superior to the standard method of discriminatively training HMM's. 1 Introduction A standard way of performing classification using a generative model is to divide the training cases into their respective classes and then train a set of class conditional models. This unsupervised approach to classification is appealing for two reasons.
Incremental A*
Incremental search techniques find optimal solutions to series of similar search tasks much faster than is possible by solving each search task from scratch. While researchers have developed incremental versions of uninformed search methods, we develop an incremental version of A*. The first search of Lifelong Planning A* is the same as that of A* but all subsequent searches are much faster because it reuses those parts of the previous search tree that are identical to the new search tree. We then present experimental results that demonstrate the advantages of Lifelong Planning A* for simple route planning tasks.
Discriminative Direction for Kernel Classifiers
In many scientific and engineering applications, detecting and understanding differencesbetween two groups of examples can be reduced to a classical problem of training a classifier for labeling new examples while making as few mistakes as possible. In the traditional classification setting,the resulting classifier is rarely analyzed in terms of the properties of the input data captured by the discriminative model. However, suchanalysis is crucial if we want to understand and visualize the detected differences. We propose an approach to interpretation of the statistical modelin the original feature space that allows us to argue about the model in terms of the relevant changes to the input vectors. For each point in the input space, we define a discriminative direction to be the direction that moves the point towards the other class while introducing as little irrelevant change as possible with respect to the classifier function. Wederive the discriminative direction for kernel-based classifiers, demonstrate the technique on several examples and briefly discuss its use in the statistical shape analysis, an application that originally motivated this work.