Machine Learning
Worst-case Loss Bounds for Single Neurons
Helmbold, David P., Kivinen, Jyrki, Warmuth, Manfred K.
We analyze and compare the well-known Gradient Descent algorithm anda new algorithm, called the Exponentiated Gradient algorithm, for training a single neuron with an arbitrary transfer function . Both algorithms are easily generalized to larger neural networks, and the generalization of Gradient Descent is the standard back-propagationalgorithm. In this paper we prove worstcase loss bounds for both algorithms in the single neuron case. Since local minima make it difficult to prove worst-case bounds for gradient-based algorithms, we must use a loss function that prevents the formation of spurious local minima. We define such a matching loss function for any strictly increasing differentiable transfer function and prove worst-case loss bound for any such transfer function and its corresponding matching loss. For example, thematching loss for the identity function is the square loss and the matching loss for the logistic sigmoid is the entropic loss. The different structure of the bounds for the two algorithms indicates thatthe new algorithm outperforms Gradient Descent when the inputs contain a large number of irrelevant components.
Generalisation of A Class of Continuous Neural Networks
Shawe-Taylor, John, Zhao, Jieyu
More recently attempts have been made to introduce some computational cost related to the accuracy of the computations [5]. The model proposed in this paper weakens the computational power still further by relying on classical boolean circuits to perform the computation using a simple encoding of the real values. Using this encoding we also show that Teo circuits interpreted in the model correspond to a Neural Network design referred to as Bit Stream Neural Networks, which have been developed for hardware implementation [8]. With the perspective afforded by the general approach considered here, we are also able to analyse the Bit Stream Neural Networks (or indeed any other adaptive system based on the technique), giving VC dimension and sample size bounds for PAC learning.
Neural Networks with Quadratic VC Dimension
Koiran, Pascal, Sontag, Eduardo D.
A set of labeled training samples is provided, and a network must be obtained which is then expected to correctly classify previously unseen inputs. In this context, a central problem is to estimate the amount of training data needed to guarantee satisfactory learning performance. To study this question, it is necessary to first formalize the notion of learning from examples. One such formalization is based on the paradigm of probably approximately correct (PAC) learning, due to Valiant (1984). In this framework, one starts by fitting some function /, chosen from a predetermined class F, to the given training data. The class F is often called the "hypothesis class", and for purposes of this discussion it will be assumed that the functions in F take binary values {O, I} and are defined on a common domain X.
Rapid Quality Estimation of Neural Network Input Representations
Cherkauer, Kevin J., Shavlik, Jude W.
However, ANNs are usually costly to train, preventing one from trying many different representations. In this paper, we address this problem by introducing and evaluating three new measures for quickly estimating ANN input representation quality. Two of these, called [DBleaves and Min (leaves), consistently outperform Rendell and Ragavan's (1993) blurring measure in accurately ranking different input representations for ANN learning on three difficult, real-world datasets.
Implementation Issues in the Fourier Transform Algorithm
Tel-Aviv University Tel-Aviv, ISRAEL Abstract The Fourier transform of boolean functions has come to play an important role in proving many important learnability results. We aim to demonstrate that the Fourier transform techniques are also a useful and practical algorithm in addition to being a powerful theoretical tool. We describe the more prominent changes we have introduced to the algorithm, ones that were crucial and without which the performance of the algorithm would severely deteriorate. Oneof the benefits we present is the confidence level for each prediction which measures the likelihood the prediction is correct. 1 INTRODUCTION It has been used mainly to demonstrate the learnability of various classes of functions with respect to the uniform distribution. The work of [5] developed a very powerful algorithmic procedure: given a function and a threshold parameter it finds in polynomial time all the Fourier coefficients ofthe function larger than the threshold.
Generalized Learning Vector Quantization
We propose a new learning method, "Generalized Learning Vector Quantization (GLVQ)," in which reference vectors are updated based on the steepest descent method in order to minimize the cost function. The cost function is determined so that the obtained learning rule satisfies the convergence condition. We prove that Kohonen's rule as used in LVQ does not satisfy the convergence condition and thus degrades recognition ability. Experimental results for printed Chinese character recognition reveal that GLVQ is superior to LVQ in recognition ability.
Learning Sparse Perceptrons
Jackson, Jeffrey C., Craven, Mark
We introduce a new algorithm designed to learn sparse perceptrons overinput representations which include high-order features. Our algorithm, which is based on a hypothesis-boosting method, is able to PAClearn a relatively natural class of target concepts. Moreover, the algorithm appears to work well in practice: on a set of three problem domains, the algorithm produces classifiers that utilize small numbers of features yet exhibit good generalization performance. Perhaps most importantly, our algorithm generates concept descriptions that are easy for humans to understand. 1 Introduction Multi-layer perceptron (MLP) learning is a powerful method for tasks such as concept classification.However, in many applications, such as those that may involve scientific discovery, it is crucial to be able to explain predictions. Multi-layer perceptrons arelimited in this regard, since their representations are notoriously difficult for humans to understand.
Symplectic Nonlinear Component Analysis
Statistically independent features can be extracted by finding a factorial representationof a signal distribution. Principal Component Analysis (PCA) accomplishes this for linear correlated and Gaussian distributedsignals. Independent Component Analysis (ICA), formalized by Comon (1994), extracts features in the case of linear statisticaldependent but not necessarily Gaussian distributed signals. Nonlinear Component Analysis finally should find a factorial representationfor nonlinear statistical dependent distributed signals. This paper proposes for this task a novel feed-forward, information conserving, nonlinear map - the explicit symplectic transformations. It also solves the problem of non-Gaussian output distributions by considering single coordinate higher order statistics. 1 Introduction In previous papers Deco and Brauer (1994) and Parra, Deco, and Miesbach (1995) suggest volume conserving transformations and factorization as the key elements for a nonlinear version of Independent Component Analysis. As a general class of volume conserving transformations Parra et al. (1995) propose the symplectic transformation. It was defined by an implicit nonlinear equation, which leads to a complex relaxation procedure for the function recall. In this paper an explicit form of the symplectic map is proposed, overcoming thus the computational problems.
Active Gesture Recognition using Learned Visual Attention
Darrell, Trevor, Pentland, Alex
We have developed a foveated gesture recognition system that runs in an unconstrained office environment with an active camera. Using visionroutines previously implemented for an interactive environment, wedetermine the spatial location of salient body parts of a user and guide an active camera to obtain images of gestures or expressions. A hidden-state reinforcement learning paradigm is used to implement visual attention. The attention module selects targets to foveate based on the goal of successful recognition, and uses a new multiple-model Q-Iearning formulation. Given a set of target and distractor gestures, our system can learn where to foveate to maximally discriminate a particular gesture. 1 INTRODUCTION Vision has numerous uses in the natural world.
Improving Committee Diagnosis with Resampling Techniques
Parmanto, Bambang, Munro, Paul W., Doyle, Howard R.
Central to the performance improvement of a committee relative to individual networks is the error correlation between networks in the committee. We investigated methods of achieving error independence betweenthe networks by training the networks with different resampling sets from the original training set. The methods were tested on the sinwave artificial task and the real-world problems of hepatoma (liver cancer) and breast cancer diagnoses.