Perceptrons
Learning Stochastic Perceptrons Under k-Blocking Distributions
We present a statistical method that PAC learns the class of stochastic perceptrons with arbitrary monotonic activation func(cid:173) tion and weights Wi E {-I, 0, I} when the probability distribution that generates the input examples is member of a family that we call k-blocking distributions. Such distributions represent an impor(cid:173) tant step beyond the case where each input variable is statistically independent since the 2k-blocking family contains all the Markov distributions of order k. By stochastic percept ron we mean a per(cid:173) ceptron which, upon presentation of input vector x, outputs 1 with probability fCLJi WiXi - B). Because the same algorithm works for any monotonic (nondecreasing or nonincreasing) activation func(cid:173) tion f on Boolean domain, it handles the well studied cases of sigmolds and the "usual" radial basis functions.
Learning in large linear perceptrons and why the thermodynamic limit is relevant to the real world
We present a new method for obtaining the response function 9 and its average G from which most of the properties of learning and generalization in linear perceptrons can be derived. We first rederive the known results for the'thermodynamic limit' of infinite perceptron size N and show explicitly that 9 is self-averaging in this limit. We then discuss extensions of our method to more gen(cid:173) eral learning scenarios with anisotropic teacher space priors, input distributions, and weight decay terms. Finally, we use our method to calculate the finite N corrections of order 1/ N to G and discuss the corresponding finite size effects on generalization and learning dynamics. An important spin-off is the observation that results obtained in the thermodynamic limit are often directly relevant to systems of fairly modest, 'real-world' sizes.
On-line Learning of Dichotomies
The performance of on-line algorithms for learning dichotomies is studied. In on-line learn(cid:173) ing, the number of examples P is equivalent to the learning time, since each example is presented only once. The learning curve, or generalization error as a function of P, depends on the schedule at which the learning rate is lowered. For a target that is a perceptron rule, the learning curve of the perceptron algorithm can decrease as fast as p- 1, if the sched(cid:173) ule is optimized. If the target is not realizable by a perceptron, the perceptron algorithm does not generally converge to the solution with lowest generalization error.
Implementation of Neural Hardware with the Neural VLSI of URAN in Applications with Reduced Representations
This paper describes a way of neural hardware implementation with the analog-digital mixed mode neural chip. The full custom neural VLSI of Universally Reconstructible Artificial Neural network (URAN) is used system. A to multi-layer perceptron with is trained successfully under the limited accuracy in computations. The network with a large frame input layer is tested to recognize spoken korean words at a forward retrieval. Multichip hardware module is suggested with eight chips or more for the extended performance and capacity.
The Ni1000: High Speed Parallel VLSI for Implementing Multilayer Perceptrons
In this paper we present a new version of the standard multilayer perceptron (MLP) algorithm for the state-of-the-art in neural net(cid:173) work VLSI implementations: the Intel Ni1000. This new version of the MLP uses a fundamental property of high dimensional spaces which allows the 12-norm to be accurately approximated by the It -norm. This approach enables the standard MLP to utilize the parallel architecture of the Ni1000 to achieve on the order of 40000, 256-dimensional classifications per second. The Nestor/Intel radial basis function neural chip (Ni1000) contains the equivalent of 1024 256-dimensional artificial digital neurons and can perform at least 40000 classifications per second [Sullivan, 1993]. To attain this great speed, the Ni1000 was designed to calculate "city block" distances (Le. the II-norm) and thus to avoid the large number of multiplication units that would be required to calculate Euclidean dot products in parallel. Thus the Nil000 is ideally suited to perform both the RCE [Reillyet al., 1982] and PRCE [Scofield et al., 1987] algorithms or any of the other commonly used radial basis function (RBF) algorithms.
A Connectionist Technique for Accelerated Textual Input: Letting a Network Do the Typing
Each year people spend a huge amount of time typing. The text people type typically contains a tremendous amount of redundancy due to predictable word usage patterns and the text's structure. This paper describes a neural network system call AutoTypist that monitors a person's typing and predicts what will be entered next. AutoTypist displays the most likely subsequent word to the typist, who can accept it with a single keystroke, instead of typing it in its entirety. The multi-layer perceptron at the heart of Auto'JYpist adapts its predictions of likely subsequent text to the user's word usage pattern, and to the characteristics of the text currently being typed.
Predicting the Risk of Complications in Coronary Artery Bypass Operations using Neural Networks
Experiments demonstrated that sigmoid multilayer perceptron (MLP) networks provide slightly better risk prediction than conventional logistic regression when used to predict the risk of death, stroke, and renal failure on 1257 patients who underwent coronary artery bypass operations at the Lahey Clinic. MLP networks with no hidden layer and networks with one hidden layer were trained using stochastic gradient descent with early stopping. MLP networks and logistic regression used the same input features and were evaluated using bootstrap sampling with 50 replications. ROC areas for predicting mortality using preoperative input features were 70.5% for logistic regression and 76.0% for MLP networks. Regularization provided by early stopping was an important component of improved perfonnance.
Learning Sparse Perceptrons
We introduce a new algorithm designed to learn sparse percep(cid:173) trons over input representations which include high-order features. Our algorithm, which is based on a hypothesis-boosting method, is able to PAC-learn a relatively natural class of target concepts. Moreover, the algorithm appears to work well in practice: on a set of three problem domains, the algorithm produces classifiers that utilize small numbers of features yet exhibit good generalization performance. Perhaps most importantly, our algorithm generates concept descriptions that are easy for humans to understand.
Active Learning in Multilayer Perceptrons
We propose an active learning method with hidden-unit reduction. First, we review our active learning method, and point out that many Fisher-information-based methods applied to MLP have a critical problem: the information matrix may be singular. To solve this problem, we derive the singularity condition of an information ma(cid:173) trix, and propose an active learning technique that is applicable to MLP. Its effectiveness is verified through experiments.
A Realizable Learning Task which Exhibits Overfitting
In this paper we examine a perceptron learning task. The task is realizable since it is provided by another perceptron with identi(cid:173) cal architecture. Both perceptrons have nonlinear sigmoid output functions. The gain of the output function determines the level of nonlinearity of the learning task. It is observed that a high level of nonlinearity leads to overfitting.