Europe
Discontinuous Generalization in Large Committee Machines
H. Schwarze Dept. of Theoretical Physics Lund University Solvegatan 14A 223 62 Lund Sweden J.Hertz Nordita Blegdamsvej 17 2100 Copenhagen 0 Denmark Abstract The problem of learning from examples in multilayer networks is studied within the framework of statistical mechanics. Using the replica formalism we calculate the average generalization error of a fully connected committee machine in the limit of a large number of hidden units. If the number of training examples is proportional to the number of inputs in the network, the generalization error as a function of the training set size approaches a finite value. If the number of training examples is proportional to the number of weights in the network we find first-order phase transitions with a discontinuous drop in the generalization error for both binary and continuous weights. 1 INTRODUCTION Feedforward neural networks are widely used as nonlinear, parametric models for the solution of classification tasks and function approximation. Trained from examples of a given task, they are able to generalize, i.e. to compute the correct output for new, unknown inputs.
Agnostic PAC-Learning of Functions on Analog Neural Nets
Abstract: There exist a number of negative results ([J), [BR), [KV]) about learning on neural nets in Valiant's model [V) for probably approximately correctlearning ("PAClearning"). These negative results are based on an asymptotic analysis where one lets the number of nodes in the neural net go to infinit.y. Hence this analysis is less adequate forthe investigation of learning on a small fixed neural net.
Supervised Learning with Growing Cell Structures
Center positions are continuously updated through soft competitive learning. The width of the radial basis functions is derived from the distance to topological neighbors. During the training the observed error is accumulated locally and used to determine where to insert the next unit. This leads (in case of classification problems) to the placement of units near class borders rather than near frequency peaks as is done by most existing methods. The resulting networks need few training epochs and seem to generalize very well. This is demonstrated by examples.
Unsupervised Parallel Feature Extraction from First Principles
EE., Linkoping University S-58183 Linkoping Sweden Abstract We describe a number of learning rules that can be used to train unsupervised parallelfeature extraction systems. The learning rules are derived using gradient ascent of a quality function. We consider anumber of quality functions that are rational functions of higher order moments of the extracted feature values. We show that one system learns the principle components of the correlation matrix.Principal component analysis systems are usually not optimal feature extractors for classification. Therefore we design quality functions which produce feature vectors that support unsupervised classification.The properties of the different systems are compared with the help of different artificially designed datasets and a database consisting of all Munsell color spectra. 1 Introduction There are a number of unsupervised Hebbian learning algorithms (see Oja, 1992 and references therein) that perform some version of the Karhunen-Loeve expansion.
Central and Pairwise Data Clustering by Competitive Neural Networks
Buhmann, Joachim, Hofmann, Thomas
Data clustering amounts to a combinatorial optimization problem to reduce thecomplexity of a data representation and to increase its precision. Central and pairwise data clustering are studied in the maximum entropy framework.For central clustering we derive a set of reestimation equations and a minimization procedure which yields an optimal number ofclusters, their centers and their cluster probabilities. A meanfield approximation for pairwise clustering is used to estimate assignment probabilities. A se1fconsistent solution to multidimensional scaling and pairwise clustering is derived which yields an optimal embedding and clustering of data points in a d-dimensional Euclidian space. 1 Introduction A central problem in information processing is the reduction of the data complexity with minimal loss in precision to discard noise and to reveal basic structure of data sets. Data clustering addresses this tradeoff by optimizing a cost function which preserves the original data as complete as possible and which simultaneously favors prototypes with minimal complexity (Linde et aI., 1980; Gray, 1984; Chou et aI., 1989; Rose et ai., 1990). We discuss anobjective function for the joint optimization of distortion errors and the complexity of a reduced data representation.
An Introduction to Least Commitment Planning
Recent developments have clarified the process of generating partially ordered, partially specified sequences of actions whose execution will achieve an agent's goal. This article summarizes a progression of least commitment planners, starting with one that handles the simple STRIPS representation and ending with UCPOP, a planner that manages actions with disjunctive precondition, conditional effects, and universal quantification over dynamic universes. Along the way, I explain how Chapman's formulation of the modal truth criterion is misleading and why his NP-completeness result for reasoning about plans with conditional effects does not apply to UCPOP.
Comparative Analysis of AI Planning Systems: A Report on the AAAI Workshop
Kambhampati presented theoretical planning systems is difficult. Although national AI conference, was lively It was noted that comparing planners encoding expert knowledge is at the and interesting. Both the theoretical is similar in difficulty to comparing heart of HTN planning, there and practical sides of the AI planning programming languages (in fact, the remains a considerable gap to bridge community were represented, input specifications to a planner can in using expert planning knowledge and both sides seemed to understand be viewed as a programming language). Shlomo Zilberstein (University Third, it was generally acknowledged Several papers contributed further of Massachusetts) presented a that common plan representations to the theoretical analysis of number of evaluation measures. A algorithms or through empirical An integrated system that executes common representation would allow studies (Christer Backstrom, or uses the generated plans formal comparisons among widely Linkoping University, Sweden; Subbarao should be evaluated instead of simply different planning technologies.
Applied AI News
MT Telecom, a Dutch telecommunications utility, has installed expert BNR Europe (Harlow, England), the instrument aboard the satellite. Pending system-based help desk systems to R&D subsidiary of telecommunications NASA approval, EUVE will be the centralize its 23 networked local data equipment supplier Northern first orbiting astrophysics mission to Telecom, is using virtual reality technology replace humans with AI technology. This installation proved to be a critical planning. The VR system allows Re:Member Data Services (Memphis, factor in helping the company BNR's engineers to visualize complex Tenn.), a data processor for obtain the IS0 9000 Total Quality installations and how they will work, credit union software services, has System Standard certification, a greatly saving time and effort compared automated all company service and requirement for those organizations to the traditional CAD system. Continental Bank (Chicago, Ill.) has expert system tracks all requests developed a client/server-based intelligent called in by users, and all requests Lockheed Missiles ST Space (Palo application to improve the can be accessed by anyone at the Alto, Calif.) has developed ASAP quality of its customer service.
The 1993 International Logic Programming Symposium
The 1993 International Logic Programming Symposium was held in Vancouver, British Columbia, on 26-29 October. It presented the state of the art in logic programming, emphasizing the deliberate interaction with other fields, in particular, humanistic fields. Topics covered at the symposium included algorithmic analysis, programming methodologies, semantic analysis, deductive databases, and programming language design.
Operations for Learning with Graphical Models
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, andthe manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximizationalgorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feed-forward networks, and learning Gaussian and discrete Bayesian networks from data. The paper concludes by sketching some implications for data analysis and summarizing how some popular algorithms fall within the framework presented. The main original contributions here are the decompositiontechniques and the demonstration that graphical models provide a framework for understanding and developing complex learning algorithms.