Goto

Collaborating Authors

 Computational Learning Theory


Pac-Learning Recursive Logic Programs: Efficient Algorithms

Journal of Artificial Intelligence Research

We present algorithms that learn certain classes of function-free recursive logic programs in polynomial time from equivalence queries. In particular, we show that a single k-ary recursive constant-depth determinate clause is learnable. Two-clause programs consisting of one learnable recursive clause and one constant-depth determinate non-recursive clause are also learnable, if an additional ``basecase'' oracle is assumed. These results immediately imply the pac-learnability of these classes. Although these classes of learnable recursive programs are very constrained, it is shown in a companion paper that they are maximally general, in that generalizing either class in any natural way leads to a computationally difficult learning problem. Thus, taken together with its companion paper, this paper establishes a boundary of efficient learnability for recursive logic programs.


Autoencoders, Minimum Description Length and Helmholtz Free Energy

Neural Information Processing Systems

An autoencoder network uses a set of recognition weights to convert an input vector into a code vector. It then uses a set of generative weights to convert the code vector into an approximate reconstruction of the input vector. We derive an objective function for training autoencoders based on the Minimum Description Length (MDL) principle. The aim is to minimize the information required to describe both the code vector and the reconstruction error. We show that this information is minimized by choosing code vectors stochastically according to a Boltzmann distribution, where the generative weights define the energy of each possible code vector given the input vector. Unfortunately, if the code vectors use distributed representations, it is exponentially expensive to compute this Boltzmann distribution because it involves all possible code vectors. We show that the recognition weights of an autoencoder can be used to compute an approximation to the Boltzmann distribution and that this approximation gives an upper bound on the description length. Even when this bound is poor, it can be used as a Lyapunov function for learning both the generative and the recognition weights. We demonstrate that this approach can be used to learn factorial codes.


Agnostic PAC-Learning of Functions on Analog Neural Nets

Neural Information Processing Systems

Abstract: There exist a number of negative results ([J), [BR), [KV]) about learning on neural nets in Valiant's model [V) for probably approximately correct learning ("PAClearning"). These negative results are based on an asymptotic analysis where one lets the number of nodes in the neural net go to infinit.y. Hence this analysis is less adequate for the investigation of learning on a small fixed neural net.


Agnostic PAC-Learning of Functions on Analog Neural Nets

Neural Information Processing Systems

Abstract: There exist a number of negative results ([J), [BR), [KV]) about learning on neural nets in Valiant's model [V) for probably approximately correct learning ("PAClearning"). These negative results are based on an asymptotic analysis where one lets the number of nodes in the neural net go to infinit.y. Hence this analysis is less adequate for the investigation of learning on a small fixed neural net.


Developing Population Codes by Minimizing Description Length

Neural Information Processing Systems

The Minimum Description Length principle (MDL) can be used to train the hidden units of a neural network to extract a representation that is cheap to describe but nonetheless allows the input to be reconstructed accurately. We show how MDL can be used to develop highly redundant population codes. Each hidden unit has a location in a low-dimensional implicit space. If the hidden unit activities form a bump of a standard shape in this space, they can be cheaply encoded by the center ofthis bump. So the weights from the input units to the hidden units in an autoencoder are trained to make the activities form a standard bump.


Developing Population Codes by Minimizing Description Length

Neural Information Processing Systems

The Minimum Description Length principle (MDL) can be used to train the hidden units of a neural network to extract a representation thatis cheap to describe but nonetheless allows the input to be reconstructed accurately. We show how MDL can be used to develop highly redundant population codes. Each hidden unit has a location in a low-dimensional implicit space. If the hidden unit activities form a bump of a standard shape in this space, they can be cheaply encoded by the center ofthis bump. So the weights from the input units to the hidden units in an autoencoder are trained to make the activities form a standard bump.


Autoencoders, Minimum Description Length and Helmholtz Free Energy

Neural Information Processing Systems

An autoencoder network uses a set of recognition weights to convert an input vector into a code vector. It then uses a set of generative weights to convert the code vector into an approximate reconstruction of the input vector. We derive an objective function for training autoencoders based on the Minimum Description Length (MDL) principle. The aim is to minimize the information required to describe both the code vector and the reconstruction error. We show that this information is minimized by choosing code vectors stochastically according to a Boltzmann distribution, wherethe generative weights define the energy of each possible code vector given the input vector. Unfortunately, if the code vectors use distributed representations, it is exponentially expensive to compute this Boltzmann distribution because it involves all possible code vectors. We show that the recognition weights of an autoencoder can be used to compute an approximation to the Boltzmann distribution and that this approximation givesan upper bound on the description length. Even when this bound is poor, it can be used as a Lyapunov function for learning both the generative and the recognition weights. We demonstrate that this approach can be used to learn factorial codes.


Counting function theorem for multi-layer networks

Neural Information Processing Systems

If N hin then such a perceptron must have all units of the first hidden layer fully connected to inputs. This implies the maximal capacities (in the sense of Cover) of 2n input patterns per hidden unit and 2 input patterns per synaptic weight of such networks (both capacities are achieved by networks with single hidden layer and are the same as for a single neuron). Comparing these results with recent estimates of VC-dimension we find that in contrast to the single neuron case, for sufficiently large nand hl, the VC-dimension exceeds Cover's capacity. 1 Introduction In the course of theoretical justification of many of the claims made about neural networks regarding their ability to learn a set of patterns and their ability to generalise, variousconcepts of maximal storage capacity were developed. In particular Cover's capacity [4] and VC-dimension [12] are two expressions of this notion and are of special interest here. We should stress that both capacities are not easy to compute and are presen tly known in a few particular cases of feedforward networks only.VC-dimension, in spite of being introduced much later, has been far 375 376 Kowalczyk more researched, perhaps due to its significance expressed by a well known relation between generalisation and learning errors [12, 3].


Agnostic PAC-Learning of Functions on Analog Neural Nets

Neural Information Processing Systems

Abstract: There exist a number of negative results ([J), [BR), [KV]) about learning on neural nets in Valiant's model [V) for probably approximately correctlearning ("PAClearning"). These negative results are based on an asymptotic analysis where one lets the number of nodes in the neural net go to infinit.y. Hence this analysis is less adequate forthe investigation of learning on a small fixed neural net.


Substructure Discovery Using Minimum Description Length and Background Knowledge

Journal of Artificial Intelligence Research

The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUBDUE's ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain. Description of Online Appendix: This is a compressed tar file containing the SUBDUE discovery system, written in C. The program accepts as input databases represented in graph form, and will output discovered substructures with their corresponding value.