Information Technology
A Constructive Learning Algorithm for Discriminant Tangent Models
Sona, Diego, Sperduti, Alessandro, Starita, Antonina
To reduce the computational complexity of classification systems using tangent distance, Hastie et al. (HSS) developed an algorithm todevise rich models for representing large subsets of the data which computes automatically the "best" associated tangent subspace.Schwenk & Milgram proposed a discriminant modular classification system (Diabolo) based on several autoassociative multilayer perceptrons which use tangent distance as error reconstruction measure. We propose a gradient based constructive learning algorithm for building a tangent subspace model with discriminant capabilities which combines several of the the advantages of both HSS and Diabolo: devised tangent models hold discriminant capabilities, space requirements are improved with respect to HSS since our algorithm is discriminant and thus it needs fewer prototype models, dimension of the tangent subspace is determined automatically by the constructive algorithm, and our algorithm is able to learn new transformations.
Multilayer Neural Networks: One or Two Hidden Layers?
Brightwell, Graham, Kenyon, Claire, Paugam-Moisy, Hรฉlรจne
The number of hidden layers is a crucial parameter for the architecture of multilayer neural networks. Early research, in the 60's, addressed the problem of exactly realizing Booleanfunctions with binary networks or binary multilayer networks. On the one hand, more recent work focused on approximately realizing real functions with multilayer neural networks with one hidden layer [6, 7, 11] or with two hidden units [2]. On the other hand, some authors [1, 12] were interested in finding bounds on the architecture of multilayer networks for exact realization of a finite set of points.
One-unit Learning Rules for Independent Component Analysis
Neural one-unit learning rules for the problem of Independent Component Analysis(ICA) and blind source separation are introduced. In these new algorithms, every ICA neuron develops into a separator thatfinds one of the independent components. The learning rules use very simple constrained Hebbianjanti-Hebbian learning in which decorrelating feedback may be added. To speed up the convergence of these stochastic gradient descent rules, a novel computationally efficientfixed-point algorithm is introduced. 1 Introduction Independent Component Analysis (ICA) (Comon, 1994; Jutten and Herault, 1991) is a signal processing technique whose goal is to express a set of random variables aslinear combinations of statistically independent component variables. The main applications of ICA are in blind source separation, feature extraction, and blind deconvolution.
Time Series Prediction using Mixtures of Experts
Zeevi, Assaf J., Meir, Ron, Adler, Robert J.
We wish to exploit the linear autoregressive technique in a manner that will enable a substantial increase in modeling power, in a framework which is nonlinear and yet mathematically tractable. The novel model, whose main building blocks are linear AR models, deviates from linearity in the integration process, that is, the way these blocks are combined. This model was first formulated in the context of a regression problem, and an extension to a hierarchical structure was also given [2]. It was termed the mixture of experts model (MEM). Variants of this model have recently been used in prediction problems both in economics and engineering.
Dynamic Features for Visual Speechreading: A Systematic Comparison
Gray, Michael S., Movellan, Javier R., Sejnowski, Terrence J.
Humans use visual as well as auditory speech signals to recognize spoken words. A variety of systems have been investigated for performing thistask. The main purpose of this research was to systematically comparethe performance of a range of dynamic visual features on a speechreading task. We have found that normalization ofimages to eliminate variation due to translation, scale, and planar rotation yielded substantial improvements in generalization performanceregardless of the visual representation used. In addition, the dynamic information in the difference between successive framesyielded better performance than optical-flow based approaches, and compression by local low-pass filtering worked surprisingly betterthan global principal components analysis (PCA). These results are examined and possible explanations are explored.
Estimating Equivalent Kernels for Neural Networks: A Data Perturbation Approach
The perturbation method which we have presented overcomes the limitations of standard approaches, which are only appropriate for models with a single layer of adjustable weights, albeit at considerable computational expense. It has the added bonus of automatically taking into account the effect of regularisation techniques such as weight decay. The experimental results illustrate the application of the technique to two simple problems. As expected the number of degrees of freedom in the models is found to be related to the amount of weight decay used during training. The equivalent kernels are found to vary significantly in different regions of input space and the functions reconstructed from the estimated smoother matrices closely match the origna!
ICMAS '96: Norms, Obligations, and Conventions
The Second International Conference on Multiagent Systems (ICMAS-96) Workshop on Norms, Obligations, and Conventions was held in Kyoto, Japan, from 10 to 13 December 1996. Participants included scientists from deontic logic, database framework, decision theory, agent architecture, cognitive modeling, and legal expert systems. This article summarizes the contributions chosen for presentation and their links to these areas.
Statistical Techniques for Natural Language Parsing
I review current statistical work on syntactic parsing and then consider part-of-speech tagging, which was the first syntactic problem to successfully be attacked by statistical techniques and also serves as a good warm-up for the main topic-statistical parsing. Here, I consider both the simplified case in which the input string is viewed as a string of parts of speech and the more interesting case in which the parser is guided by statistical information about the particular words in the sentence. Finally, I anticipate future research directions.
Empirical Methods in Information Extraction
This article surveys the use of empirical, machine-learning methods for a particular natural language-understanding task-information extraction. The author presents a generic architecture for information-extraction systems and then surveys the learning algorithms that have been developed to address the problems of accuracy, portability, and knowledge acquisition for each component of the architecture.