Plotting

A Solution for Missing Data in Recurrent Neural Networks with an Application to Blood Glucose Prediction

Neural Information Processing Systems

Volker Tresp and Thomas Briegel * Siemens AG Corporate Technology Otto-Hahn-Ring 6 81730 Miinchen, Germany Abstract We consider neural network models for stochastic nonlinear dynamical systems where measurements of the variable of interest are only available atirregular intervals i.e. most realizations are missing. Difficulties arise since the solutions for prediction and maximum likelihood learning withmissing data lead to complex integrals, which even for simple cases cannot be solved analytically. In this paper we propose a specific combinationof a nonlinear recurrent neural predictive model and a linear error model which leads to tractable prediction and maximum likelihood adaptation rules. In particular, the recurrent neural network can be trained using the real-time recurrent learning rule and the linear error model can be trained by an EM adaptation rule, implemented using forward-backwardKalman filter equations. The model is applied to predict the glucose/insulin metabolism of a diabetic patient where blood glucose measurements are only available a few times a day at irregular intervals.


Boltzmann Machine Learning Using Mean Field Theory and Linear Response Correction

Neural Information Processing Systems

We present a new approximate learning algorithm for Boltzmann Machines, using a systematic expansion of the Gibbs free energy to second order in the weights. The linear response correction to the correlations is given by the Hessian of the Gibbs free energy. The computational complexity of the algorithm is cubic in the number of neurons. We compare the performance of the exact BM learning algorithm with first order (Weiss) mean field theory and second order (TAP) mean field theory. The learning task consists of a fully connected Ising spin glass model on 10 neurons. We conclude that 1) the method works well for paramagnetic problems 2) the TAP correction gives a significant improvement over the Weiss mean field theory, both for paramagnetic and spin glass problems and 3) that the inclusion of diagonal weights improves the Weiss approximation for paramagnetic problems, but not for spin glass problems.


A Simple and Fast Neural Network Approach to Stereovision

Neural Information Processing Systems

A neural network approach to stereovision is presented based on aliasing effects of simple disparity estimators and a fast coherencedetection scheme.Within a single network structure, a dense disparity map with an associated validation map and, additionally, the fused cyclopean view of the scene are available. The network operations are based on simple, biological plausible circuitry; the algorithm is fully parallel and non-iterative. 1 Introduction Humans experience the three-dimensional world not as it is seen by either their left or right eye, but from a position of a virtual cyclopean eye, located in the middle between the two real eye positions. The different perspectives between the left and right eyes cause slight relative displacements of objects in the two retinal images (disparities), which make a simple superposition of both images without diplopia impossible. Proper fusion of the retinal images into the cyclopean view requires the registration of both images to a common coordinate system, which in turn requires calculation of disparities for all image areas which are to be fused.


Using Expectation to Guide Processing: A Study of Three Real-World Applications

Neural Information Processing Systems

In many real world tasks, only a small fraction of the available inputs are important at any particular time. This paper presents a method for ascertaining the relevance of inputs by exploiting temporal coherence and predictability. The method proposed inthis paper dynamically allocates relevance to inputs by using expectations of their future values. As a model of the task is learned, the model is simultaneously extendedto create task-specific predictions of the future values of inputs. Inputs which are either not relevant, and therefore not accounted for in the model, or those which contain noise, will not be predicted accurately. These inputs can be de-emphasized, and, in turn, a new, improved, model of the task created.


Correlates of Attention in a Model of Dynamic Visual Recognition

Neural Information Processing Systems

Given a set ofobjects in the visual field, how does the the visual system learn to attend to a particular object of interest while ignoring the rest? In this paper, we attempt to answer these questions in the context of a Kalman filter-based model of visual recognition that has previously proved useful in explaining certain neurophysiological phenomena suchas endstopping and related extra-classical receptive field effects in the visual cortex. The resulting robust Kalman filter model demonstrates howcertain forms of attention can be viewed as an emergent property of the interaction between top-down expectations and bottom-up signals. Themodel also suggests functional interpretations ofcertain attentionrelated effectsthat have been observed in visual cortical neurons. Experimental resultsare provided to help demonstrate the ability of the model to perform robust segmentation and recognition of objects and image sequences inthe presence of varying degrees of occlusions and clutter. 1 INTRODUCTION The human visual system possesses the remarkable ability to recognize objects despite the presence of distractors and occluders in the field of view.


The Error Coding and Substitution PaCTs

Neural Information Processing Systems

A new class of plug in classification techniques have recently been developed inthe statistics and machine learning literature. A plug in classification technique(PaCT) is a method that takes a standard classifier (such as LDA or TREES) and plugs it into an algorithm to produce a new classifier. The standard classifier is known as the Plug in Classifier (PiC).These methods often produce large improvements over using a single classifier. In this paper we investigate one of these methods and give some motivation for its success.


Statistical Models of Conditioning

Neural Information Processing Systems

Conditioning experiments probe the ways that animals make predictions aboutrewards and punishments and use those predictions to control their behavior. One standard model of conditioning paradigms which involve many conditioned stimuli suggests that individual predictions should be added together. Various key results show that this model fails in some circumstances, and motivate analternative model, in which there is attentional selection between different available stimuli. The new model is a form of mixture of experts, has a close relationship with some other existing psychologicalsuggestions, and is statistically well-founded.


Bayesian Model of Surface Perception

Neural Information Processing Systems

Image intensity variations can result from several different object surface effects, including shading from 3-dimensional relief of the object, or paint on the surface itself. An essential problem in vision, which people solve naturally, is to attribute the proper physical cause, e.g.


How to Dynamically Merge Markov Decision Processes

Neural Information Processing Systems

We are frequently called upon to perform multiple tasks that compete forour attention and resource. Often we know the optimal solution to each task in isolation; in this paper, we describe how this knowledge can be exploited to efficiently find good solutions for doing the tasks in parallel. We formulate this problem as that of dynamically merging multiple Markov decision processes (MDPs) into a composite MDP, and present a new theoretically-sound dynamic programmingalgorithm for finding an optimal policy for the composite MDP. We analyze various aspects of our algorithm and illustrate its use on a simple merging problem. Every day, we are faced with the problem of doing mUltiple tasks in parallel, each of which competes for our attention and resource. If we are running a job shop, we must decide which machines to allocate to which jobs, and in what order, so that no jobs miss their deadlines. If we are a mail delivery robot, we must find the intended recipients of the mail while simultaneously avoiding fixed obstacles (such as walls) and mobile obstacles (such as people), and still manage to keep ourselves sufficiently charged up. Frequently we know how to perform each task in isolation; this paper considers how we can take the information we have about the individual tasks and combine it to efficiently find an optimal solution for doing the entire set of tasks in parallel. More importantly, we describe a theoretically-sound algorithm for doing this merging dynamically; new tasks (such as a new job arrival at a job shop) can be assimilated online into the solution being found for the ongoing set of simultaneous tasks.