AITopics

We compare two regularization methods which can be used to improve the generalization capabilities of Gaussian mixture density estimates. The first method uses a Bayesian prior on the parameter space. We derive EM (Expectation Maximization) update rules which maximize the a posterior parameter probability. In the second approach we apply ensemble averaging to density estimation. This includes Breiman's "bagging", which recently has been found to produce impressive results for classification networks.

improved gaussian mixture density estimate, parameter estimate, regularization, (9 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.05)
North America > United States > California (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)

Williams, Christopher K. I., Rasmussen, Carl Edward

Gaussian Processes for Regression

The Bayesian analysis of neural networks is difficult because a simple prior over weights implies a complex prior distribution over functions. In this paper we investigate the use of Gaussian process priors over functions, which permit the predictive Bayesian analysis for fixed values of hyperparameters to be carried out exactly using matrix operations. Two methods, using optimization and averaging (via Hybrid Monte Carlo) over hyperparameters have been tested on a number of challenging problems and have produced excellent results. 1 INTRODUCTION In the Bayesian approach to neural networks a prior distribution over the weights induces a prior distribution over functions. This prior is combined with a noise model, which specifies the probability of observing the targets t given function values y, to yield a posterior over functions which can then be used for predictions. For neural networks the prior over functions has a complex form which means that implementations must either make approximations (e.g.

covariance function, gaussian process, hyperparameter, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Hofmann, Reimar, Tresp, Volker

Discovering Structure in Continuous Variables Using Bayesian Networks

We study Bayesian networks for continuous variables using nonlinear conditional density estimators. We demonstrate that useful structures can be extracted from a data set in a self-organized way and we present sampling techniques for belief update based on Markov blanket conditional density models.

bayesian network, conditional density model, density model, (12 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Hihi, Salah El, Bengio, Yoshua

Hierarchical Recurrent Neural Networks for Long-Term Dependencies

Learning long-term dependencies is not as difficult with NARX recurrent neural networks.

dependency, long-term dependency, time scale, (15 more...)

Country:

North America > Canada > Quebec > Montreal (0.05)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Wu, Lizhong, Moody, John E.

A Smoothing Regularizer for Recurrent Neural Networks

We derive a smoothing regularizer for recurrent network models by requiring robustness in prediction performance to perturbations of the training data. The regularizer can be viewed as a generalization of the first order Tikhonov stabilizer to dynamic models. The closed-form expression of the regularizer covers both time-lagged and simultaneous recurrent nets, with feedforward nets and onelayer linear nets as special cases. We have successfully tested this regularizer in a number of case studies and found that it performs better than standard quadratic weight decay. 1 Introd uction One technique for preventing a neural network from overfitting noisy data is to add a regularizer to the error function being minimized. Regularizers typically smooth the fit to noisy data. Well-established techniques include ridge regression, see (Hoerl & Kennard 1970), and more generally spline smoothing functions or Tikhonov stabilizers that penalize the mth-order squared derivatives of the function being fit, as in (Tikhonov & Arsenin 1977), (Eubank 1988), (Hastie & Tibshirani 1990) and (Wahba 1990). Thes(-ilethods have recently been extended to networks of radial basis functions (Girosi, Jones & Poggio 1995), and several heuristic approaches have been developed for sigmoidal neural networks, for example, quadratic weight decay (Plaut, Nowlan & Hinton 1986), weight elimination (Scalettar & Zee 1988),(Chauvin 1990),(Weigend, Rumelhart & Huberman 1990) and soft weight sharing (Nowlan & Hinton 1992).

neural network, regularizer, weight decay, (13 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > United States > New York (0.05)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(3 more...)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.42)

Baldi, Pierre, Hornik, Kurt

Universal Approximation and Learning of Trajectories Using Oscillators

Natural and artificial neural circuits must be capable of traversing specific state space trajectories. A natural approach to this problem is to learn the relevant trajectories from examples. Unfortunately, gradient descent learning of complex trajectories in amorphous networks is unsuccessful. We suggest a possible approach where trajectories are realized by combining simple oscillators, in various modular ways. We contrast two regimes of fast and slow oscillations. In all cases, we show that banks of oscillators with bounded frequencies have universal approximation properties. Open questions are also discussed briefly.

module, oscillator, trajectory, (12 more...)

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > Austria (0.04)
Asia > Japan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Sato, Atsushi, Yamada, Keiji

Generalized Learning Vector Quantization

We propose a new learning method, "Generalized Learning Vector Quantization (GLVQ)," in which reference vectors are updated based on the steepest descent method in order to minimize the cost function. The cost function is determined so that the obtained learning rule satisfies the convergence condition. We prove that Kohonen's rule as used in LVQ does not satisfy the convergence condition and thus degrades recognition ability. Experimental results for printed Chinese character recognition reveal that GLVQ is superior to LVQ in recognition ability.

cost function, reference vector, vector, (12 more...)

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.69)

Blatt, Marcelo, Wiseman, Shai, Domany, Eytan

Clustering data through an analogy to the Potts model

A new approach for clustering is proposed. This method is based on an analogy to a physical model; the ferromagnetic Potts model at thermal equilibrium is used as an analog computer for this hard optimization problem. We do not assume any structure of the underlying distribution of the data. Phase space of the Potts model is divided into three regions; ferromagnetic, super-paramagnetic and paramagnetic phases. The region of interest is that corresponding to the super-paramagnetic one, where domains of aligned spins appear.

pott model, spin spin correlation function, super-paramagnetic phase, (13 more...)

Country:

Asia > Middle East > Israel (0.05)
Europe > Germany (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Family Discovery

Omohundro, Stephen M.

"Family discovery" is the task of learning the dimension and structure of a parameterized family of stochastic models. It is especially appropriate when the training examples are partitioned into "episodes" of samples drawn from a single parameter value. We present three family discovery algorithms based on surface learning and show that they significantly improve performance over two alternatives on a parameterized classification task.

algorithm, family discovery algorithm, parameterized family, (13 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.05)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Konig, Yochai, Bourlard, Hervé, Morgan, Nelson

REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities - Application to Transition-Based Connectionist Speech Recognition

In this paper, we introduce REMAP, an approach for the training and estimation of posterior probabilities using a recursive algorithm that is reminiscent of the EMbased Forward-Backward (Liporace 1982) algorithm for the estimation of sequence likelihoods. Although very general, the method is developed in the context of a statistical model for transition-based speech recognition using Artificial Neural Networks (ANN) to generate probabilities for Hidden Markov Models (HMMs). In the new approach, we use local conditional posterior probabilities of transitions to estimate global posterior probabilities of word sequences. Although we still use ANNs to estimate posterior probabilities, the network is trained with targets that are themselves estimates of local posterior probabilities. An initial experimental result shows a significant decrease in error-rate in comparison to a baseline system. 1 INTRODUCTION The ultimate goal in speech recognition is to determine the sequence of words that has been uttered.

algorithm, posterior probability, probability, (11 more...)

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.05)
North America > United States > California > Alameda County > Berkeley (0.05)
North America > United States > Oregon (0.04)
Europe > Belgium (0.04)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.91)