Goto

Collaborating Authors

 Oceania


Hebb Learning of Features based on their Information Content

Neural Information Processing Systems

This paper investigates the stationary points of a Hebb learning rule with a sigmoid nonlinearity in it. We show mathematically that when the input has a low information content, as measured by the input's variance, this learning rule suppresses learning, that is, forces the weight vector to converge to the zero vector. When the information content exceeds a certain value, the rule will automatically begin to learn a feature in the input. Our analysis suggests that under certain conditions it is the first principal component that is learned. The weight vector length remains bounded, provided the variance of the input is finite. Simulations confirm the theoretical results derived.



For Valid Generalization the Size of the Weights is More Important than the Size of the Network

Neural Information Processing Systems

Baum and Haussler [4] used these results to give sample size bounds for multi-layer threshold networks Generalization and the Size of the Weights in Neural Networks 135 that grow at least as quickly as the number of weights (see also [7]). However, for pattern classification applications the VC-bounds seem loose; neural networks often perform successfully with training sets that are considerably smaller than the number of weights. This paper shows that for classification problems on which neural networks perform well, if the weights are not too big, the size of the weights determines the generalization performance. In contrast with the function classes and algorithms considered in the VC-theory, neural networks used for binary classification problems have real-valued outputs, and learning algorithms typically attempt to minimize the squared error of the network output over a training set. As well as encouraging the correct classification, this tends to push the output away from zero and towards the target values of { -1, I}.


For Valid Generalization the Size of the Weights is More Important than the Size of the Network

Neural Information Processing Systems

Baum and Haussler [4] used these results to give sample size bounds for multi-layer threshold networks Generalization and the Size ofthe Weights in Neural Networks 135 that grow at least as quickly as the number of weights (see also [7]). However, for pattern classification applications the VC-bounds seem loose; neural networks often perform successfully with training sets that are considerably smaller than the number of weights. This paper shows that for classification problems on which neural networksperform well, if the weights are not too big, the size of the weights determines the generalization performance. In contrast with the function classes and algorithms considered in the VC-theory, neural networks used for binary classification problems have real-valued outputs, and learning algorithms typically attempt to minimize the squared error of the network output over a training set. As well as encouraging the correct classification, this tends to push the output away from zero and towards the target values of { -1, I}.


Interpolating Earth-science Data using RBF Networks and Mixtures of Experts

Neural Information Processing Systems

We present a mixture of experts (ME) approach to interpolate sparse, spatially correlated earth-science data. Kriging is an interpolation method which uses a global covariation model estimated from the data to take account of the spatial dependence in the data. Based on the close relationship between kriging and the radial basis function (RBF) network (Wan & Bone, 1996), we use a mixture of generalized RBF networks to partition the input space into statistically correlated regions and learn the local covariation model of the data in each region. Applying the ME approach to simulated and real-world data, we show that it is able to achieve good partitioning of the input space, learn the local covariation models and improve generalization.


Hebb Learning of Features based on their Information Content

Neural Information Processing Systems

This paper investigates the stationary points of a Hebb learning rule with a sigmoid nonlinearity in it. We show mathematically that when the input has a low information content, as measured by the input's variance, this learning rule suppresses learning, that is, forces the weight vector to converge to the zero vector. When the information content exceeds a certain value, the rule will automatically begin to learn a feature in the input. Our analysis suggests that under certain conditions it is the first principal component that is learned. The weight vector length remains bounded, provided the variance of the input is finite. Simulations confirm the theoretical results derived.


Identifying Hierarchical Structure in Sequences: A linear-time algorithm

Journal of Artificial Intelligence Research

SEQUITUR is an algorithm that infers a hierarchical structure from a sequence of discrete symbols by replacing repeated phrases with a grammatical rule that generates the phrase, and continuing this process recursively. The result is a hierarchical representation of the original sequence, which offers insights into its lexical structure. The algorithm is driven by two constraints that reduce the size of the grammar, and produce structure as a by-product. SEQUITUR breaks new ground by operating incrementally. Moreover, the method's simple structure permits a proof that it operates in space and time that is linear in the size of the input. Our implementation can process 50,000 symbols per second and has been applied to an extensive range of real world sequences.



Improved Heterogeneous Distance Functions

Journal of Artificial Intelligence Research

Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes.


Experiments with Neural Networks for Real Time Implementation of Control

Neural Information Processing Systems

This paper describes a neural network based controller for allocating capacity in a telecommunications network. This system was proposed in order to overcome a "real time" response constraint. Two basic architectures are evaluated: 1) a feedforward network-heuristic and; 2) a feedforward network-recurrent network. These architectures are compared against a linear programming (LP) optimiser as a benchmark. This LP optimiser was also used as a teacher to label the data samples for the feedforward neural network training algorithm. It is found that the systems are able to provide a traffic throughput of 99% and 95%, respectively, of the throughput obtained by the linear programming solution. Once trained, the neural network based solutions are found in a fraction of the time required by the LP optimiser.