AITopics

Further im- 528 F. Leisch and K. Hornik provements should be possible based on a better understanding of the theoretical properties of resample and combine techniques. These issues are currently being investigated.

classifier, new adaptive resampling algorithm, variance, (13 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.29)
North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Lee, Daniel D., Seung, H. Sebastian

Unsupervised Learning by Convex and Conic Coding

Unsupervised learning algorithms based on convex and conic encoders are proposed. The encoders find the closest convex or conic combination of basis vectors to the input. The learning algorithms produce basis vectors that minimize the reconstruction error of the encoders. The convex algorithm develops locally linear models of the input, while the conic algorithm discovers features. Both algorithms are used to model handwritten digits and compared with vector quantization and principal component analysis.

algorithm, basis vector, encoder, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Unification of Information Maximization and Minimization

Kamimura, Ryotaro

In the present paper, we propose a method to unify information maximization and minimization in hidden units. The information maximization and minimization are performed on two different levels: collective and individual level. Thus, two kinds of information: collective and individual information are defined. By maximizing collective information and by minimizing individual information, simple networks can be generated in terms of the number of connections and the number of hidden units. Obtained networks are expected to give better generalization and improved interpretation of internal representations.

individual information, information, information controller, (12 more...)

Country:

North America > United States > New York (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Jordan, Michael I., Ghahramani, Zoubin, Saul, Lawrence K.

Hidden Markov Decision Trees

We study a time series model that can be viewed as a decision tree with Markov temporal structure. The model is intractable for exact calculations, thus we utilize variational approximations. We consider three different distributions for the approximation: one in which the Markov calculations are performed exactly and the layers of the decision tree are decoupled, one in which the decision tree calculations are performed exactly and the time steps of the Markov chain are decoupled, and one in which a Viterbi-like assumption is made to pick out a single most likely state sequence.

approximation, decision tree, node, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.09)
North America > United States > California > San Mateo County > Redwood City (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Combinations of Weak Classifiers

Ji, Chuanyi, Ma, Sheng

To obtain classification systems with both good generalization performance and efficiency in space and time, we propose a learning method based on combinations of weak classifiers, where weak classifiers are linear classifiers (perceptrons) which can do a little better than making random guesses. A randomized algorithm is proposed to find the weak classifiers. They· are then combined through a majority vote. As demonstrated through systematic experiments, the method developed is able to obtain combinations of weak classifiers with good generalization performance and a fast training time on a variety of test problems and real applications.

algorithm, classifier, weak classifier, (14 more...)

Country:

North America > United States > New York > Rensselaer County > Troy (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.47)

Industry: Education (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.32)

Jaakkola, Tommi, Jordan, Michael I.

Recursive Algorithms for Approximating Probabilities in Graphical Models

We develop a recursive node-elimination formalism for efficiently approximating large probabilistic networks. No constraints are set on the network topologies. Yet the formalism can be straightforwardly integrated with exact methods whenever they are/become applicable. The approximations we use are controlled: they maintain consistently upper and lower bounds on the desired quantities at all times. We show that Boltzmann machines, sigmoid belief networks, or any combination (i.e., chain graphs) can be handled within the same framework.

boltzmann machine, partition function, recursion, (13 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Jordan (0.08)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.38)

Hyvärinen, Aapo, Oja, Erkki

One-unit Learning Rules for Independent Component Analysis

Neural one-unit learning rules for the problem of Independent Component Analysis (ICA) and blind source separation are introduced. In these new algorithms, every ICA neuron develops into a separator that finds one of the independent components. The learning rules use very simple constrained Hebbianjanti-Hebbian learning in which decorrelating feedback may be added. To speed up the convergence of these stochastic gradient descent rules, a novel computationally efficient fixed-point algorithm is introduced. 1 Introduction Independent Component Analysis (ICA) (Comon, 1994; Jutten and Herault, 1991) is a signal processing technique whose goal is to express a set of random variables as linear combinations of statistically independent component variables. The main applications of ICA are in blind source separation, feature extraction, and blind deconvolution.

algorithm, fixed-point algorithm, kurtosis, (14 more...)

Country:

Europe > Finland > Uusimaa > Helsinki (0.05)
North America > United States > Colorado > Denver County > Denver (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Hochreiter, Sepp, Schmidhuber, Jürgen

LSTM can Solve Hard Long Time Lag Problems

Standard recurrent nets cannot deal with long minimal time lags between relevant signals. Several recent NIPS papers propose alternative methods. We first show: problems used to promote various previous algorithms can be solved more quickly by random weight guessing than by the proposed algorithms. We then use LSTM, our own recent algorithm, to solve a hard problem that can neither be quickly solved by random search nor by any other recurrent net algorithm we are aware of.

algorithm, memory cell, sequence, (16 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.05)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Balancing Between Bagging and Bumping

Heskes, Tom

We compare different methods to combine predictions from neural networks trained on different bootstrap samples of a regression problem. One of these methods, introduced in [6] and which we here call balancing, is based on the analysis of the ensemble generalization error into an ambiguity term and a term incorporating generalization performances of individual networks. We show how to estimate these individual errors from the residuals on validation patterns. Weighting factors for the different networks follow from a quadratic programming problem. On a real-world problem concerning the prediction of sales figures and on the well-known Boston housing data set, balancing clearly outperforms other recently proposed alternatives as bagging [1] and bumping [8]. 1 EARLY STOPPING AND BOOTSTRAPPING Stopped training is a popular strategy to prevent overfitting in neural networks.

generalization error, individual generalization error, validation, (15 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands > Gelderland > Nijmegen (0.05)

Industry: Banking & Finance > Real Estate (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.36)

Fritsch, Jürgen, Finke, Michael, Waibel, Alex

Adaptively Growing Hierarchical Mixtures of Experts

We propose a novel approach to automatically growing and pruning Hierarchical Mixtures of Experts. The constructive algorithm proposed here enables large hierarchies consisting of several hundred experts to be trained effectively. We show that HME's trained by our automatic growing procedure yield better generalization performance than traditional static and balanced hierarchies. Evaluation of the algorithm is performed (1) on vowel classification and (2) within a hybrid version of the JANUS r9] speech recognition system using a subset of the Switchboard large-vocabulary speaker-independent continuous speech recognition database.

hierarchical mixture, hme, probability, (17 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)