AITopics

doi: 10.1613/jair.97

AI Access Foundation

10132

Country:

Europe > Italy (0.04)
Europe > Austria > Vienna (0.04)
North America > United States (0.04)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.64)

Hinton, Geoffrey E., Zemel, Richard S.

Autoencoders, Minimum Description Length and Helmholtz Free Energy

An autoencoder network uses a set of recognition weights to convert an input vector into a code vector. It then uses a set of generative weights to convert the code vector into an approximate reconstruction of the input vector. We derive an objective function for training autoencoders based on the Minimum Description Length (MDL) principle. The aim is to minimize the information required to describe both the code vector and the reconstruction error. We show that this information is minimized by choosing code vectors stochastically according to a Boltzmann distribution, where the generative weights define the energy of each possible code vector given the input vector. Unfortunately, if the code vectors use distributed representations, it is exponentially expensive to compute this Boltzmann distribution because it involves all possible code vectors. We show that the recognition weights of an autoencoder can be used to compute an approximation to the Boltzmann distribution and that this approximation gives an upper bound on the description length. Even when this bound is poor, it can be used as a Lyapunov function for learning both the generative and the recognition weights. We demonstrate that this approach can be used to learn factorial codes.

input vector, probability distribution, reconstruction cost, (12 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
Asia > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.92)

Agnostic PAC-Learning of Functions on Analog Neural Nets

Maass, Wolfgang

Abstract: There exist a number of negative results ([J), [BR), [KV]) about learning on neural nets in Valiant's model [V) for probably approximately correct learning ("PAClearning"). These negative results are based on an asymptotic analysis where one lets the number of nodes in the neural net go to infinit.y. Hence this analysis is less adequate for the investigation of learning on a small fixed neural net.

activation function, agnostic pac-learning, programmable parameter, (14 more...)

Country: Europe > Austria > Styria > Graz (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Agnostic PAC-Learning of Functions on Analog Neural Nets

Maass, Wolfgang

activation function, agnostic pac-learning, programmable parameter, (14 more...)

Country: Europe > Austria > Styria > Graz (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Zemel, Richard S., Hinton, Geoffrey E.

Developing Population Codes by Minimizing Description Length

The Minimum Description Length principle (MDL) can be used to train the hidden units of a neural network to extract a representation that is cheap to describe but nonetheless allows the input to be reconstructed accurately. We show how MDL can be used to develop highly redundant population codes. Each hidden unit has a location in a low-dimensional implicit space. If the hidden unit activities form a bump of a standard shape in this space, they can be cheaply encoded by the center ofthis bump. So the weights from the input units to the hidden units in an autoencoder are trained to make the activities form a standard bump.

algorithm, implicit coordinate, implicit space, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.50)

Zemel, Richard S., Hinton, Geoffrey E.

Developing Population Codes by Minimizing Description Length

The Minimum Description Length principle (MDL) can be used to train the hidden units of a neural network to extract a representation thatis cheap to describe but nonetheless allows the input to be reconstructed accurately. We show how MDL can be used to develop highly redundant population codes. Each hidden unit has a location in a low-dimensional implicit space. If the hidden unit activities form a bump of a standard shape in this space, they can be cheaply encoded by the center ofthis bump. So the weights from the input units to the hidden units in an autoencoder are trained to make the activities form a standard bump.

artificial intelligence, implicit space, machine learning, (16 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.49)

Hinton, Geoffrey E., Zemel, Richard S.

Autoencoders, Minimum Description Length and Helmholtz Free Energy

An autoencoder network uses a set of recognition weights to convert an input vector into a code vector. It then uses a set of generative weights to convert the code vector into an approximate reconstruction of the input vector. We derive an objective function for training autoencoders based on the Minimum Description Length (MDL) principle. The aim is to minimize the information required to describe both the code vector and the reconstruction error. We show that this information is minimized by choosing code vectors stochastically according to a Boltzmann distribution, wherethe generative weights define the energy of each possible code vector given the input vector. Unfortunately, if the code vectors use distributed representations, it is exponentially expensive to compute this Boltzmann distribution because it involves all possible code vectors. We show that the recognition weights of an autoencoder can be used to compute an approximation to the Boltzmann distribution and that this approximation givesan upper bound on the description length. Even when this bound is poor, it can be used as a Lyapunov function for learning both the generative and the recognition weights. We demonstrate that this approach can be used to learn factorial codes.

artificial intelligence, input vector, machine learning, (14 more...)

Country: North America > Canada > Ontario > Toronto (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.92)

Counting function theorem for multi-layer networks

Kowalczyk, Adam

If N hin then such a perceptron must have all units of the first hidden layer fully connected to inputs. This implies the maximal capacities (in the sense of Cover) of 2n input patterns per hidden unit and 2 input patterns per synaptic weight of such networks (both capacities are achieved by networks with single hidden layer and are the same as for a single neuron). Comparing these results with recent estimates of VC-dimension we find that in contrast to the single neuron case, for sufficiently large nand hl, the VC-dimension exceeds Cover's capacity. 1 Introduction In the course of theoretical justification of many of the claims made about neural networks regarding their ability to learn a set of patterns and their ability to generalise, variousconcepts of maximal storage capacity were developed. In particular Cover's capacity [4] and VC-dimension [12] are two expressions of this notion and are of special interest here. We should stress that both capacities are not easy to compute and are presen tly known in a few particular cases of feedforward networks only.VC-dimension, in spite of being introduced much later, has been far 375 376 Kowalczyk more researched, perhaps due to its significance expressed by a well known relation between generalisation and learning errors [12, 3].

artificial intelligence, machine learning, vc-dimension, (15 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.75)

Agnostic PAC-Learning of Functions on Analog Neural Nets

Maass, Wolfgang

Abstract: There exist a number of negative results ([J), [BR), [KV]) about learning on neural nets in Valiant's model [V) for probably approximately correctlearning ("PAClearning"). These negative results are based on an asymptotic analysis where one lets the number of nodes in the neural net go to infinit.y. Hence this analysis is less adequate forthe investigation of learning on a small fixed neural net.

activation function, artificial intelligence, machine learning, (16 more...)

Country: Europe > Austria (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Cook, D. J., Holder, L. B.

Substructure Discovery Using Minimum Description Length and Background Knowledge

Journal of Artificial Intelligence ResearchFeb-1-1994

The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUBDUE's ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain. Description of Online Appendix: This is a compressed tar file containing the SUBDUE discovery system, written in C. The program accepts as input databases represented in graph form, and will output discovered substructures with their corresponding value.

ch 2, ch 3, description length and background knowledge, (3 more...)

doi: 10.1613/jair.43

AI Access Foundation

10113

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.80)