AITopics

Data clustering amounts to a combinatorial optimization problem to reduce the complexity of a data representation and to increase its precision. Central and pairwise data clustering are studied in the maximum entropy framework. For central clustering we derive a set of reestimation equations and a minimization procedure which yields an optimal number of clusters, their centers and their cluster probabilities. A meanfield approximation for pairwise clustering is used to estimate assignment probabilities. A se1fconsistent solution to multidimensional scaling and pairwise clustering is derived which yields an optimal embedding and clustering of data points in a d-dimensional Euclidian space. 1 Introduction A central problem in information processing is the reduction of the data complexity with minimal loss in precision to discard noise and to reveal basic structure of data sets. Data clustering addresses this tradeoff by optimizing a cost function which preserves the original data as complete as possible and which simultaneously favors prototypes with minimal complexity (Linde et aI., 1980; Gray, 1984; Chou et aI., 1989; Rose et ai., 1990). We discuss an objective function for the joint optimization of distortion errors and the complexity of a reduced data representation. A maximum entropy estimation of the cluster assignments yields a unifying framework for clustering algorithms with a number of different distortion and complexity measures. The close analogy of complexity optimized clustering with winner-take-all neural networks suggests a neural-like implementation resembling topological feature maps (see Figure 1).

central and pairwise data clustering, complexity, free energy, (11 more...)

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Gold, Steven, Mjolsness, Eric, Rangarajan, Anand

Clustering with a Domain-Specific Distance Measure

The distance measure and learning problem are formally described as nested objective functions. We derive an efficient algorithm by using optimization techniques that allow us to divide up the objective function into parts which may be minimized in distinct phases. The algorithm has accurately recreated 10 prototypes from a randomly generated sample database of 100 images consisting of 20 points each in 120 experiments. Finally, by incorporating permutation invariance in our distance measure, we have a technique that we may be able to apply to the clustering of graphs. Our goal is to develop measures which will enable the learning of objects with shape or structure. Acknowledgements This work has been supported by AFOSR grant F49620-92-J-0465 and ONR/DARPA grant N00014-92-J-4048.

algorithm, cluster center, distance measure, (15 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Industry: Government > Regional Government > North America Government > United States Government (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Bengio, Yoshua, Frasconi, Paolo

Credit Assignment through Time: Alternatives to Backpropagation

Learning to recognize or predict sequences using long-term context has many applications. However, practical and theoretical problems are found in training recurrent neural networks to perform tasks in which input/output dependencies span long intervals. Starting from a mathematical analysis of the problem, we consider and compare alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled. Results on the new algorithms show performance qualitatively superior to that obtained with backpropagation. 1 Introduction Recurrent neural networks have been considered to learn to map input sequences to output sequences. Machines that could efficiently learn such tasks would be useful for many applications involving sequence prediction, recognition or production. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. In fact, we can prove that dynamical systems such as recurrent neural networks will be increasingly difficult to train with gradient descent as the duration of the dependencies to be captured increases. A mathematical analysis of the problem shows that either one of two conditions arises in such systems.

algorithm, information, sequence, (13 more...)

Country:

North America > Canada > Quebec > Montreal (0.05)
Asia > Middle East > Jordan (0.05)
Europe > Italy (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.52)

Maron, Oded, Moore, Andrew W.

Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation

Such decisions include which function approximator to use, how to trade smoothness for goodness of fit and which features are relevant.

algorithm, hoeffding race, iteration, (12 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.40)

Bregler, Christoph, Omohundro, Stephen M.

Surface Learning with Applications to Lipreading

Most connectionist research has focused on learning mappings from one space to another (eg.

dimension, query, surface learning, (15 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > Illinois (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Levin, Asriel U., Leen, Todd K., Moody, John E.

Fast Pruning Using Principal Components

The assumption is that there exists an underlying (possibly noisy) functional relationship relating the outputs to the inputs y /(u,e) where e denotes the noise. The aim of the learning process is to approximate this relationship based on the the training set.

algorithm, correlation matrix, hessian, (14 more...)

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Unsupervised Learning of Mixtures of Multiple Causes in Binary Data

Saund, Eric

This paper presents a formulation for unsupervised learning of clusters reflecting multiple causal structure in binary data. Unlike the standard mixture model, a multiple cause model accounts for observed data by combining assertions from many hidden causes, each of which can pertain to varying degree to any subset of the observable dimensions. A crucial issue is the mixing-function for combining beliefs from different cluster-centers in order to generate data reconstructions whose errors are minimized both during recognition and learning. We demonstrate a weakness inherent to the popular weighted sum followed by sigmoid squashing, and offer an alternative form of the nonlinearity. Results are presented demonstrating the algorithm's ability successfully to discover coherent multiple causal representat.ions of noisy test data and in images of printed characters. 1 Introduction The objective of unsupervised learning is to identify patterns or features reflecting underlying regularities in data. Single-cause techniques, including the k-means algorithm and the standard mixture-model (Duda and Hart, 1973), represent clusters of data points sharing similar patterns of Is and Os under the assumption that each data point belongs to, or was generated by, one and only one cluster-center; output activity is constrained to sum to 1. In contrast, a multiple-cause model permits more than one cluster-center to become fully active in accounting for an observed data vector.

prediction, representation, unsupervised learning, (13 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > New York (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.87)

Das, Sreerupa, Mozer, Michael C.

A Unified Gradient-Descent/Clustering Architecture for Finite State Machine Induction

Researchers often try to understand-post hoc-representations that emerge in the hidden layers of a neural net following training. Interpretation is difficult because these representations are typically highly distributed and continuous. By "continuous," we mean that if one constructed a scatterplot over the hidden unit activity space of patterns obtained in response to various inputs, examination at any scale would reveal the patterns to be broadly distributed over the space.

dolce, representation, unified gradient-descent clustering architecture, (12 more...)

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.43)

Elfadel, I. M., J. L. Wyatt, Jr.

The "Softmax" Nonlinearity: Derivation Using Statistical Mechanics and Useful Properties as a Multiterminal Analog Circuit Element

Reciprocal circuit elements facilitate such an implementation since they 882 The "Softmax" Nonlinearity 883 can be combined with other reciprocal elements to form an analog network having Lyapunov-like functions: the network content or co-content. In this paper, we show a reciprocal implementation of the "softmax" nonlinearity that is usually used to enforce local competition between neurons [Peterson, 1989]. We show that the circuit ispassive and incrementally passive, and we explicitly compute its content and co-content functions. This circuit adds a new element to the library of the analog circuit designer that can be combined with reciprocal constraint boxes [Harris, 1988] and nonlinear resistive fuses [Harris, 1989] to form fast, analog VLSI optimization networks.

circuit implementation, implementation, softmax, (10 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.15)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > California > San Diego County > San Diego (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)

Wang, Changfeng, Venkatesh, Santosh S., Judd, J. Stephen

Optimal Stopping and Effective Machine Complexity in Learning

We study tltt' problem of when to stop If'arning a class of feedforward networks - networks with linear outputs I1PUrOIl and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there a.re in general three distinct phases in the generalization performance in the learning process, and in particular, the network has hetter gt'neralization pPTformance when learning is stopped at a certain time before til(' global miniIl111lu of the empirical error is reachert. A notion of effective size of a machine is rtefil1e i and used to explain the tradeoff betwf'en the complexity of the marhine and the training error ill the learning process. The study leads nat.urally to a network size selection critt'rion, which turns Ol1t to be a generalization of Akaike's Information Criterioll for the It'arning process. It if; shown that stopping Iparning before tiJt' global minimum of the empirical error has the effect of network size splectioll. 1 INTRODUCTION The primary goal of learning in neural nets is to find a network that gives valid generalization. In achieving this goal, a central issue is the tradeoff between the training error and network complexity. This usually reduces to a problem of network size selection, which has drawn much research effort in recent years. Various principles, theories, and intuitions, including Occam's razor, statistical model selection criteria such as Akaike's Information Criterion (AIC) [11 and many others [5, 1, 10,3,111 all quantitatively support the following PAC prescription: between two machines which have the same empirical error, the machine with smaller VC-dimf'nsion generalizes better. However, it is noted that these methods or criteria do not npcpssarily If'ad to optimal (or llearly optimal) generalization performance.

effective size, generalization error, stopping and effective machine complexity, (11 more...)

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)