Goto

Collaborating Authors

 Information Technology


The Neurothermostat: Predictive Optimal Control of Residential Heating Systems

Neural Information Processing Systems

The Neurothermostat is an adaptive controller that regulates indoor airtemperature in a residence by switching a furnace on or off. The task is framed as an optimal control problem in which both comfort and energy costs are considered as part of the control objective.Because the consequences of control decisions are delayed in time, the Neurothermostat must anticipate heating demands withpredictive models of occupancy patterns and the thermal response of the house and furnace.


Bayesian Model Comparison by Monte Carlo Chaining

Neural Information Processing Systems

Neural Computing Research Group Aston University, Birmingham, B4 7ET, U.K. http://www.ncrg.aston.ac.uk/ Abstract The techniques of Bayesian inference have been applied with great success to many problems in neural computing including evaluation of regression functions, determination of error bars on predictions, and the treatment of hyper-parameters. However, the problem of model comparison is a much more challenging one for which current techniques have significant limitations. In this paper we show how an extended form of Markov chain Monte Carlo, called chaining, is able to provide effective estimates of the relative probabilities of different models. We present results from the robot arm problem and compare them with the corresponding results obtained using the standard Gaussian approximation framework. Initially this is chosen to be some prior distribution p(wIM), which can be combined with a likelihood function p( Dlw, M) using Bayes' theorem to give a posterior distribution p(wID, M) in the form ( ID M) p(Dlw,M)p(wIM) p w, p(DIM) (1) where D is the data set.


A Model of Recurrent Interactions in Primary Visual Cortex

Neural Information Processing Systems

A general feature of the cerebral cortex is its massive interconnectivity -it has been estimated anatomically [19] that cortical neurons receive upwards of 5,000 synapses, the majority of which originate from other nearby cortical neurons. Numerous experiments inprimary visual cortex (VI) have revealed strongly nonlinear interactions between stimulus elements which activate classical and nonclassical receptive field regions.


Second-order Learning Algorithm with Squared Penalty Term

Neural Information Processing Systems

This paper compares three penalty terms with respect to the efficiency ofsupervised learning, by using first-and second-order learning algorithms. Our experiments showed that for a reasonably adequate penaltyfactor, the combination of the squared penalty term and the second-order learning algorithm drastically improves the convergence performance more than 20 times over the other combinations, atthe same time bringing about a better generalization performance.


Bayesian Unsupervised Learning of Higher Order Structure

Neural Information Processing Systems

Many real world patterns have a hierarchical underlying structure in which simple features have a higher order structure among themselves. Because these relationships are often statistical in nature, it is natural to view the process of discovering such structures as a statistical inference problem in which a hierarchical model is fit to data. Hierarchical statistical structure can be conveniently represented with Bayesian belief networks (Pearl, 1988; Lauritzen and Spiegelhalter, 1988; Neal, 1992). These 530 M.S. Lewicki and T. 1. Sejnowski models are powerful, because they can capture complex statistical relationships among the data variables, and also mathematically convenient, because they allow efficient computation of the joint probability for any given set of model parameters. The joint probability density of a network of binary states is given by a product of conditional probabilities (1) where W is the weight matrix that parameterizes the model. Note that the probability ofan individual state Si depends only on its parents.


Online Learning from Finite Training Sets: An Analytical Case Study

Neural Information Processing Systems

By an extension of statistical mechanics methods, we obtain exact results for the time-dependent generalization error of a linear network with a large number of weights N. We find, for example, that for small training sets of size p N, larger learning rates can be used without compromising asymptotic generalization performance or convergence speed. Encouragingly, for optimal settings of TJ (and, less importantly, weight decay,\) at given final learning time, the generalization performance ofonline learning is essentially as good as that of offline learning.


Continuous Sigmoidal Belief Networks Trained using Slice Sampling

Neural Information Processing Systems

These include Boltzmann machines (Hinton and Sejnowski 1986),binary sigmoidal belief networks (Neal 1992) and Helmholtz machines (Hinton et al. 1995; Dayan et al. 1995). However, some hidden variables, such as translation or scaling in images of shapes, are best represented using continuous values.Continuous-valued Boltzmann machines have been developed (Movellan and McClelland 1993), but these suffer from long simulation settling times and the requirement of a "negative phase" during learning. Tibshirani (1992) and Bishop et al. (1996) consider learning mappings from a continuous latent variable space to a higher-dimensional input space. MacKay (1995) has developed "density networks" that can model both continuous and categorical latent spaces using stochasticity at the topmost network layer. In this paper I consider a new hierarchical top-down connectionist model that has stochastic hidden variables at all layers; moreover, these variables can adapt to be continuous or categorical. The proposed top-down model can be viewed as a continuous-valued belief network, whichcan be simulated by performing a quick top-down pass (Pearl 1988).


A Comparison between Neural Networks and other Statistical Techniques for Modeling the Relationship between Tobacco and Alcohol and Cancer

Neural Information Processing Systems

PierreBand BC Cancer Agency 601 West 10th Ave, Epidemiology Vancouver BC Canada V5Z 1L3 Joel Bert Dept of Chemical Engineering University of British Columbia 2216 Main Mall Vancouver BC Canada V6T 1Z4 JohnGrace Dept of Chemical Engineering University of British Columbia 2216 Main Mall Vancouver BC Canada V6T 1Z4 Abstract Epidemiological data is traditionally analyzed with very simple techniques. Flexible models, such as neural networks, have the potential to discover unanticipated features in the data. However, to be useful, flexible models must have effective control on overfitting. Thispaper reports on a comparative study of the predictive quality of neural networks and other flexible models applied to real and artificial epidemiological data. The results suggest that there are no major unanticipated complex features in the real data, and also demonstrate that MacKay's [1995] Bayesian neural network methodology provides effective control on overfitting while retaining theability to discover complex features in the artificial data. 1 Introduction Traditionally, very simple statistical techniques are used in the analysis of epidemiological studies.The predominant technique is logistic regression, in which the effects of predictors are linear (or categorical) and additive on the log-odds scale.


Edges are the 'Independent Components' of Natural Scenes.

Neural Information Processing Systems

Field (1994) has suggested that neurons with line and edge selectivities found in primary visual cortex of cats and monkeys form a sparse, distributed representationof natural scenes, and Barlow (1989) has reasoned that such responses should emerge from an unsupervised learning algorithm that attempts to find a factorial code of independent visual features. We show here that nonlinear'infomax', when applied to an ensemble of natural scenes,produces sets of visual filters that are localised and oriented. Some of these filters are Gabor-like and resemble those produced by the sparseness-maximisation network of Olshausen & Field (1996). In addition, the outputs of these filters are as independent as possible, since the infomax networkis able to perform Independent Components Analysis (ICA). We compare the resulting ICA filters and their associated basis functions, with other decorrelating filters produced by Principal Components Analysis (PCA) and zero-phase whitening filters (ZCA). The ICA filters have more sparsely distributed (kurtotic) outputs on natural scenes. They also resemble thereceptive fields of simple cells in visual cortex, which suggests that these neurons form an information-theoretic coordinate system for images. 1 Introduction. Both the classic experiments of Rubel & Wiesel [8] on neurons in visual cortex, and several decadesof theorising about feature detection in vision, have left open the question most succinctly phrased by Barlow "Why do we have edge detectors?" That is: are there any coding principles which would predict the formation of localised, oriented receptive 832 A.1.


A Variational Principle for Model-based Morphing

Neural Information Processing Systems

Given a multidimensional data set and a model of its density, we consider how to define the optimal interpolation between two points. This is done by assigning a cost to each path through space, based on two competing goals-one to interpolate through regions of high density, the other to minimize arc length. From this path functional, we derive the Euler-Lagrange equations for extremal motionj given two points, the desired interpolation is found by solving aboundary value problem. We show that this interpolation can be done efficiently, in high dimensions, for Gaussian, Dirichlet, and mixture models. 1 Introduction The problem of nonlinear interpolation arises frequently in image, speech, and signal processing. Consider the following two examples: (i) given two profiles of the same face, connect them by a smooth animation of intermediate poses[l]j (ii) given a telephone signal masked by intermittent noise, fill in the missing speech. Both these examples may be viewed as instances of the same abstract problem. In qualitative terms, we can state the problem as follows[2]: given a multidimensional data set, and two points from this set, find a smooth adjoining path that is consistent with available models of the data. We will refer to this as the problem of model-based morphing. In this paper, we examine this problem it arises from statistical models of multidimensional data.Specifically, our focus is on models that have been derived from Current address: AT&T Labs, 600 Mountain Ave 2D-439, Murray Hill, NJ 07974 268 LK.