AITopics

The purpose of most architecture optimization schemes is to improve generalization.

generalization, saliency, test error, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.05)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > Italy (0.04)
Europe > Denmark > Capital Region > Kongens Lyngby (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Communications > Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Williams, Christopher K. I., Rasmussen, Carl Edward

Gaussian Processes for Regression

The Bayesian analysis of neural networks is difficult because a simple priorover weights implies a complex prior distribution over functions. In this paper we investigate the use of Gaussian process priors over functions, which permit the predictive Bayesian analysis forfixed values of hyperparameters to be carried out exactly using matrix operations. Two methods, using optimization and averaging (viaHybrid Monte Carlo) over hyperparameters have been tested on a number of challenging problems and have produced excellent results. 1 INTRODUCTION In the Bayesian approach to neural networks a prior distribution over the weights induces a prior distribution over functions. This prior is combined with a noise model, which specifies the probability of observing the targets t given function values y, to yield a posterior over functions which can then be used for predictions. For neural networks the prior over functions has a complex form which means that implementations must either make approximations (e.g.

covariance function, gaussian process, hyperparameter, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Ghahramani, Zoubin, Jordan, Michael I.

Factorial Hidden Markov Models

Due to the simplicity and efficiency of its parameter estimation algorithm, the hidden Markov model (HMM) has emerged as one of the basic statistical tools for modeling discrete time series, finding widespread application in the areas of speech recognition (Rabinerand Juang, 1986) and computational molecular biology (Baldi et al., 1994). An HMM is essentially a mixture model, encoding information about the history of a time series in the value of a single multinomial variable (the hidden state). This multinomial assumption allows an efficient parameter estimation algorithm tobe derived (the Baum-Welch algorithm). However, it also severely limits the representational capacity of HMMs.

algorithm, hmm, markov model, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.07)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Geometry of Early Stopping in Linear Networks

Dodier, Robert H.

A theory of early stopping as applied to linear models is presented. The backpropagation learning algorithm is modeled as gradient descent in continuous time. Given a training set and a validation set, all weight vectors found by early stopping must lie on a certain quadricsurface, usually an ellipsoid. Given a training set and a candidate early stopping weight vector, all validation sets have least-squares weights lying on a certain plane. This latter fact can be exploited to estimate the probability of stopping at any given point along the trajectory from the initial weight vector to the leastsquares weightsderived from the training set, and to estimate the probability that training goes on indefinitely. The prospects for extending this theory to nonlinear models are discussed.

early stopping, probability, validation, (13 more...)

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > New York (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.35)

Optimization Principles for the Neural Code

DeWeese, Michael

Recent experiments show that the neural codes at work in a wide range of creatures share some common features. At first sight, these observations seem unrelated. However, we show that these features arise naturally in a linear filtered threshold crossing (LFTC) model when we set the threshold to maximize the transmitted information. This maximization process requires neural adaptation to not only the DC signal level, as in conventional light and dark adaptation, but also to the statistical structure of the signal and noise distributions. Wealso present a new approach for calculating the mutual information between a neuron's output spike train and any aspect of its input signal which does not require reconstruction of the input signal.This formulation is valid provided the correlations in the spike train are small, and we provide a procedure for checking this assumption.

information, noise, spike train, (16 more...)

Country:

North America > United States > New York (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)

Genre: Research Report (0.48)

Technology: Information Technology > Artificial Intelligence (0.47)

Shawe-Taylor, John, Zhao, Jieyu

Generalisation of A Class of Continuous Neural Networks

More recently attempts have been made to introduce some computational cost related to the accuracy of the computations [5].The model proposed in this paper weakens the computational power still further by relying on classical boolean circuits to perform the computation using asimple encoding of the real values. Using this encoding we also show that Teo circuits interpreted in the model correspond to a Neural Network design referred toas Bit Stream Neural Networks, which have been developed for hardware implementation [8]. With the perspective afforded by the general approach considered here, we are also able to analyse the Bit Stream Neural Networks (or indeed any other adaptive system basedon the technique), giving VC dimension and sample size bounds for PAC learning.

bernoulli sequence, neural network, probability, (13 more...)

Country: Europe > Switzerland (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Mansour, Yishay, Sahar, Sigal

Implementation Issues in the Fourier Transform Algorithm

Tel-Aviv University Tel-Aviv, ISRAEL Abstract The Fourier transform of boolean functions has come to play an important role in proving many important learnability results. We aim to demonstrate that the Fourier transform techniques are also a useful and practical algorithm in addition to being a powerful theoretical tool. We describe the more prominent changes we have introduced to the algorithm, ones that were crucial and without which the performance of the algorithm would severely deteriorate. Oneof the benefits we present is the confidence level for each prediction which measures the likelihood the prediction is correct. 1 INTRODUCTION It has been used mainly to demonstrate the learnability of various classes of functions with respect to the uniform distribution. The work of [5] developed a very powerful algorithmic procedure: given a function and a threshold parameter it finds in polynomial time all the Fourier coefficients ofthe function larger than the threshold.

algorithm, coefficient, hypothesis, (16 more...)

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)
Information Technology > Data Science > Data Quality > Data Transformation (0.83)
Information Technology > Artificial Intelligence > Machine Learning (0.70)

DasGupta, Bhaskar, Sontag, Eduardo D.

Sample Complexity for Learning Recurrent Perceptron Mappings

Recurrent perceptron classifiers generalize the classical perceptron model. They take into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. 1 Introduction One of the most popular approaches to binary pattern classification, underlying many statistical techniques, is based on perceptrons or linear discriminants; see for instance the classical reference (Duda and Hart, 1973).

application, dichotomy, sample complexity, (9 more...)

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Plasticity of Center-Surround Opponent Receptive Fields in Real and Artificial Neural Systems of Vision

Yasui, S., Furukawa, T., Yamada, M., Saito, T.

The center-surround opponent receptive field(CSRF) mechanism represents one such example. Here, analogous CSRFs are shown to be formed in an artificial neural network which learns to localize contours (edges) of the luminance difference. Furthermore, when the input pattern is corrupted by a background noise, the CSRFs of the hidden units becomes shallower andbroader with decrease of the signal-to-noise ratio (SNR). The same kind of SNR-dependent plasticity is present in the CSRF of real visual neurons; in bipolar cells of the carp retina as is shown here experimentally, as well as in large monopolar cells of the fly compound eye as was described by others. Also, analogous SNRdependent plasticityis shown to be present in the biphasic flash responses (BPFR) of these artificial and biological visual systems. Thus, the spatial (CSRF) and temporal (BPFR) filtering properties withwhich a wide variety of creatures see the world appear to be optimized for detectability of changes in space and time. 1 INTRODUCTION A number of learning algorithms have been developed to make synthetic neural machines be trainable to function in certain optimal ways. If the brain and nervous systems that we see in nature are best answers of the evolutionary process, then one might be able to find some common'softwares' in real and artificial neural systems. This possibility is examined in this paper, with respect to a basic visual 160 S.YASUI, T. FURUKAWA, M. YAMADA, T. SAITO

background noise, csrf, noise, (12 more...)

Country:

Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.05)
Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.05)
North America > United States > New York (0.04)
North America > United States > District of Columbia > Washington (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Simulation of a Thalamocortical Circuit for Computing Directional Heading in the Rat

Blair, Hugh T.

Several regions of the rat brain contain neurons known as head-direction celis,which encode the animal's directional heading during spatial navigation. This paper presents a biophysical model of head-direction cell acti vity, which suggests that a thalamocortical circuit might compute therat's head direction by integrating the angular velocity of the head over time. The model was implemented using the neural simulator NEURON, and makes testable predictions about the structure and function ofthe rat head-direction circuit.

ahd cell, head direction, head-direction cell, (13 more...)

Country:

North America > United States > Connecticut > New Haven County > New Haven (0.05)
North America > United States > Massachusetts > Plymouth County > Norwell (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.47)