Plotting

 Sejnowski, Terrence J.


Using Feedforward Neural Networks to Monitor Alertness from Changes in EEG Correlation and Coherence

Neural Information Processing Systems

We report here that changes in the normalized electroencephalographic (EEG)cross-spectrum can be used in conjunction with feedforward neural networks to monitor changes in alertness of operators continuouslyand in near-real time. Previously, we have shown that EEG spectral amplitudes covary with changes in alertness asindexed by changes in behavioral error rate on an auditory detection task [6,4]. Here, we report for the first time that increases in the frequency of detection errors in this task are also accompanied bypatterns of increased and decreased spectral coherence in several frequency bands and EEG channel pairs. Relationships between EEG coherence and performance vary between subjects, but within subjects, their topographic and spectral profiles appear stable from session to session. Changes in alertness also covary with changes in correlations among EEG waveforms recorded at different scalp sites, and neural networks can also estimate alertness fromcorrelation changes in spontaneous and unobtrusivelyrecorded EEGsignals. 1 Introduction When humans become drowsy, EEG scalp recordings of potential oscillations change dramatically in frequency, amplitude, and topographic distribution [3]. These changes are complex and differ between subjects [10]. Recently, we have shown 932 S.MAKEIG, T.-P.


Tempering Backpropagation Networks: Not All Weights are Created Equal

Neural Information Processing Systems

Backpropagation learning algorithms typically collapse the network's structure into a single vector of weight parameters to be optimized. We suggest that their performance may be improved by utilizing the structural informationinstead of discarding it, and introduce a framework for ''tempering'' each weight accordingly. In the tempering model, activation and error signals are treated as approximately independentrandom variables. The characteristic scale of weight changes is then matched to that ofthe residuals, allowing structural properties suchas a node's fan-in and fan-out to affect the local learning rate and backpropagated error. The model also permits calculation of an upper bound on the global learning rate for batch updates, which in turn leads to different update rules for bias vs. non-bias weights. This approach yields hitherto unparalleled performance on the family relations benchmark,a deep multi-layer network: for both batch learning with momentum and the delta-bar-delta algorithm, convergence at the optimal learning rate is sped up by more than an order of magnitude.


A Mixture Model System for Medical and Machine Diagnosis

Neural Information Processing Systems

Diagnosis of human disease or machine fault is a missing data problem since many variables are initially unknown. Additional information needs to be obtained. The j oint probability distribution of the data can be used to solve this problem. We model this with mixture models whose parameters are estimated by the EM algorithm. This gives the benefit that missing data in the database itself can also be handled correctly. The request for new information to refine the diagnosis is performed using the maximum utility principle. Since the system is based on learning it is domain independent and less labor intensive than expert systems or probabilistic networks. An example using a heart disease database is presented.


Spatial Representations in the Parietal Cortex May Use Basis Functions

Neural Information Processing Systems

The parietal cortex is thought to represent the egocentric positions ofobjects in particular coordinate systems. We propose an alternative approach to spatial perception of objects in the parietal cortexfrom the perspective of sensorimotor transformations. The responses of single parietal neurons can be modeled as a gaussian functionof retinal position multiplied by a sigmoid function of eye position, which form a set of basis functions. We show here how these basis functions can be used to generate receptive fields in either retinotopic or head-centered coordinates by simple linear transformations. This raises the possibility that the parietal cortex does not attempt to compute the positions of objects in a particular frameof reference but instead computes a general purpose representation of the retinal location and eye position from which any transformation can be synthesized by direct projection. This representation predicts that hemineglect, a neurological syndrome produced by parietal lesions, should not be confined to egocentric coordinates, but should be observed in multiple frames of reference in single patients, a prediction supported by several experiments.


A Novel Reinforcement Model of Birdsong Vocalization Learning

Neural Information Processing Systems

Songbirds learn to imitate a tutor song through auditory and motor learning. Wehave developed a theoretical framework for song learning that accounts for response properties of neurons that have been observed in many of the nuclei that are involved in song learning. Specifically, we suggest that the anteriorforebrain pathway, which is not needed for song production in the adult but is essential for song acquisition, provides synaptic perturbations and adaptive evaluations for syllable vocalization learning. A computer model based on reinforcement learning was constructed thatcould replicate a real zebra finch song with 90% accuracy based on a spectrographic measure.


A Mixture Model System for Medical and Machine Diagnosis

Neural Information Processing Systems

Diagnosis of human disease or machine fault is a missing data problem since many variables are initially unknown. Additional information needs to be obtained. The j oint probability distribution of the data can be used to solve this problem. We model this with mixture models whose parameters are estimated by the EM algorithm. This gives the benefit that missing data in the database itself can also be handled correctly. The request for new information to refine the diagnosis is performed using the maximum utility principle. Since the system is based on learning it is domain independent and less labor intensive than expert systems or probabilistic networks. An example using a heart disease database is presented.


Grouping Components of Three-Dimensional Moving Objects in Area MST of Visual Cortex

Neural Information Processing Systems

A number of studies have described neurons in the dorsal part of the medial superior temporal (MSTd) monkey cortex that respond best to large expanding/contracting, rotating, or shifting patterns (Tanaka et al., 1986; Duffy & Wurtz, 1991a). Recently Graziano et al. (1994) found that MSTd cell responses correspond to a point in a multidimensional space of spiral motions, where the dimensions are these motion types. Combinationsof these motions are generated as an animal moves through its environment, whichsuggests that area MSTd could playa role in optical flow analysis. When an observer moves through a static environment, a singularity in the flow field known as the focus of expansion may be used to determine the direction of heading (Gibson, 1950; Warren & Hannon, 1988). Previous computational models of MSTd (Lappe & Rauschecker, 1993; Perrone & Stone, 1994) have shown how navigational information related to heading may be encoded by these cells.


Plasticity-Mediated Competitive Learning

Neural Information Processing Systems

Differentiation between the nodes of a competitive learning network isconventionally achieved through competition on the basis of neural activity. Simple inhibitory mechanisms are limited to sparse representations, while decorrelation and factorization schemes that support distributed representations are computationally unattractive.By letting neural plasticity mediate the competitive interactioninstead, we obtain diffuse, nonadaptive alternatives forfully distributed representations. We use this technique to Simplify and improve our binary information gain optimization algorithmfor feature extraction (Schraudolph and Sejnowski, 1993); the same approach could be used to improve other learning algorithms. 1 INTRODUCTION Unsupervised neural networks frequently employ sets of nodes or subnetworks with identical architecture and objective function. Some form of competitive interaction isthen needed for these nodes to differentiate and efficiently complement each other in their task.


Reinforcement Learning Predicts the Site of Plasticity for Auditory Remapping in the Barn Owl

Neural Information Processing Systems

In young barn owls raised with optical prisms over their eyes, these auditory maps are shifted to stay in register with the visual map, suggesting that the visual input imposes a frame of reference on the auditory maps. However, the optic tectum, the first site of convergence of visual with auditory information, is not the site of plasticity for the shift of the auditory maps; the plasticity occurs instead in the inferior colliculus, which contains an auditory map and projects into the optic tectum. We explored a model of the owl remapping in which a global reinforcement signal whose delivery is controlled by visual foveation. A hebb learning rule gated by reinforcement learnedto appropriately adjust auditory maps. In addition, reinforcement learning preferentially adjusted the weights in the inferior colliculus, as in the owl brain, even though the weights were allowed to change throughout the auditory system. This observation raisesthe possibility that the site of learning does not have to be genetically specified, but could be determined by how the learning procedure interacts with the network architecture.


A Non-linear Information Maximisation Algorithm that Performs Blind Separation

Neural Information Processing Systems

With the exception of (Becker 1992), there has been little attempt to use non-linearity in networks to achieve something a linear network could not. Nonlinear networks, however, are capable of computing more general statistics than those second-order ones involved in decorrelation, and as a consequence they are capable of dealing with signals (and noises) which have detailed higher-order structure. The success of the'H-J' networks at blind separation (Jutten & Herault 1991)suggests that it should be possible to separate statistically independent components, by using learning rules which make use of moments of all orders. This paper takes a principled approach to this problem, by starting with the question ofhow to maximise the information passed on in nonlinear feed-forward network. Startingwith an analysis of a single unit, the approach is extended to a network mapping N inputs to N outputs. In the process, it will be shown that, under certain fairly weak conditions, the N ---. N network forms a minimally redundant encodingofthe inputs, and that it therefore performs Independent Component Analysis (ICA). 2 Information maximisation The information that output Y contains about input X is defined as: I(Y, X) H(Y) - H(YIX) (1) where H(Y) is the entropy (information) in the output, while H(YIX) is whatever information the output has which didn't come from the input. In the case that we have no noise (or rather, we don't know what is noise and what is signal in the input), the mapping between X and Y is deterministic and H(YIX) has its lowest possible value of