Country
Restricted Value Iteration: Theory and Algorithms
Value iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this paper, we study value iteration restricted to belief subsets. We show that, together with properly chosen belief subsets, restricted value iteration yields near-optimal policies and we give a condition for determining whether a given belief subset would bring about savings in space and time. We also apply restricted value iteration to two interesting classes of POMDPs, namely informative POMDPs and near-discernible POMDPs.
Extremal Behaviour in Multiagent Contract Negotiation
We examine properties of a model of resource allocation in which several agents exchange resources in order to optimise their individual holdings. The schemes discussed relate to well-known negotiation protocols proposed in earlier work and we consider a number of alternative notions of ``rationality'' covering both quantitative measures, e.g. cooperative and individual rationality and more qualitative forms, e.g. Pigou-Dalton transfers. While it is known that imposing particular rationality and structural restrictions may result in some reallocations of the resource set becoming unrealisable, in this paper we address the issue of the number of restricted rational deals that may be required to implement a particular reallocation when it is possible to do so. We construct examples showing that this number may be exponential (in the number of resources m), even when all of the agent utility functions are monotonic. We further show that k agents may achieve in a single deal a reallocation requiring exponentially many rational deals if at most k-1 agents can participate, this same reallocation being unrealisable by any sequences of rational deals in which at most k-2 agents are involved.
Finding Approximate POMDP solutions Through Belief Compression
Roy, N., Gordon, G., Thrun, S.
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in real-world POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, low-dimensional subspace embedded in the high-dimensional belief space. Finding a good approximation to the optimal value function for only this subspace can be much easier than computing the full value function. We introduce a new method for solving large-scale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta & Schapire, 2002) to represent sparse, high-dimensional belief spaces using small sets of learned features of the belief state. We then plan only in terms of the low-dimensional belief features. By planning in this low-dimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks.
Pairwise Clustering and Graphical Models
Shental, Noam, Zomet, Assaf, Hertz, Tomer, Weiss, Yair
Significant progress in clustering has been achieved by algorithms that are based on pairwise affinities between the datapoints. In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efficient eigenvector calculations. However,spectral methods lack a straightforward probabilistic interpretation which makes it difficult to automatically set parameters using trainingdata. In this paper we use the previously proposed typical cut framework for pairwise clustering. We show an equivalence between calculating the typical cut and inference in an undirected graphical model. We show that for clustering problems with hundreds of datapoints exact inference may still be possible. For more complicated datasets, we show that loopy belief propagation(BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. We also use graphical modelsto derive a learning algorithm for affinity matrices based on labeled data.
Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian
The online incremental gradient (or backpropagation) algorithm is widely considered to be the fastest method for solving large-scale neural-network (NN) learning problems. In contrast, we show that an appropriately implemented iterative batch-mode (or block-mode) learning method can be much faster. For example, it is three times faster in the UCI letter classification problem (26 outputs, 16,000 data items, 6,066 parameters with a two-hidden-layer multilayer perceptron) and 353 times faster in a nonlinear regression problem arising in color recipe prediction (10 outputs, 1,000 data items, 2,210 parameters with a neuro-fuzzy modular network). The three principal innovative ingredients in our algorithm are the following: First, we use scaled trust-region regularization with inner-outer iteration tosolve the associated "overdetermined" nonlinear least squares problem, where the inner iteration performs a truncated (or inexact) Newton method. Second, we employ Pearlmutter's implicit sparse Hessian matrix-vector multiply algorithm to construct theKrylov subspaces used to solve for the truncated Newton update. Third, we exploit sparsity (for preconditioning) in the matrices resulting from the NNs having many outputs.
Finding the M Most Probable Configurations using Loopy Belief Propagation
Loopy belief propagation (BP) has been successfully used in a number ofdifficult graphical models to find the most probable configuration ofthe hidden variables. In applications ranging from protein folding to image analysis one would like to find not just the best configuration but rather the top M. While this problem has been solved using the junction tree formalism, in many real world problems theclique size in the junction tree is prohibitively large. In this work we address the problem of finding the M best configurations whenexact inference is impossible. We start by developing a new exact inference algorithm for calculating thebest configurations that uses only max-marginals. For approximate inference,we replace the max-marginals with the beliefs calculated using max-product BP and generalized BP. We show empirically thatthe algorithm can accurately and rapidly approximate the M best configurations in graphs with hundreds of variables.
A Mixed-Signal VLSI for Real-Time Generation of Edge-Based Image Vectors
Yagi, Masakazu, Yamasaki, Hideo, Shibata, Tadashi
A mixed-signal image filtering VLSI has been developed aiming at real-time generation of edge-based image vectors for robust image recognition. A four-stage asynchronous median detection architecture basedon analog digital mixed-signal circuits has been introduced todetermine the threshold value of edge detection, the key processing parameter in vector generation. As a result, a fully seamless pipeline processing from threshold detection to edge feature mapgeneration has been established. A prototype chip was designed in a 0.35-µm double-polysilicon three-metal-layer CMOS technology and the concept was verified by the fabricated chip. The chip generates a 64-dimension feature vector from a 64x64-pixel gray scale image every 80µsec.
Analytical Solution of Spike-timing Dependent Plasticity Based on Synaptic Biophysics
Porr, Bernd, Saudargiene, Ausra, Wörgötter, Florentin
Spike timing plasticity (STDP) is a special form of synaptic plasticity where the relative timing of post-and presynaptic activity determines the change of the synaptic weight. On the postsynaptic side, active backpropagating spikesin dendrites seem to play a crucial role in the induction of spike timing dependent plasticity. We argue that postsynaptically the temporal change of the membrane potential determines the weight change. Coming from the presynaptic side induction of STDP is closely related to the activation of NMDA channels. Therefore, we will calculate analytically the change of the synaptic weight by correlating the derivative ofthe membrane potential with the activity of the NMDA channel.
Pairwise Clustering and Graphical Models
Shental, Noam, Zomet, Assaf, Hertz, Tomer, Weiss, Yair
Significant progress in clustering has been achieved by algorithms that are based on pairwise affinities between the datapoints. In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efficient eigenvector calculations. However, spectral methods lack a straightforward probabilistic interpretation which makes it difficult to automatically set parameters using training data. In this paper we use the previously proposed typical cut framework for pairwise clustering. We show an equivalence between calculating the typical cut and inference in an undirected graphical model. We show that for clustering problems with hundreds of datapoints exact inference may still be possible. For more complicated datasets, we show that loopy belief propagation (BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. We also use graphical models to derive a learning algorithm for affinity matrices based on labeled data.