Goto

Collaborating Authors

 Country


A Method for Learning From Hints

Neural Information Processing Systems

We address the problem of learning an unknown function by pu tting together several pieces of information (hints) that we know about the function. We introduce a method that generalizes learning from examples to learning from hints. A canonical representation of hints is defined and illustrated for new types of hints. All the hints are represented to the learning process by examples, and examples of the function are treated on equal footing with the rest of the hints. During learning, examples from different hints are selected for processing according to a given schedule. We present two types of schedules; fixed schedules that specify the relative emphasis of each hint, and adaptive schedules that are based on how well each hint has been learned so far. Our learning method is compatible with any descent technique that we may choose to use.


Using Prior Knowledge in a NNPDA to Learn Context-Free Languages

Neural Information Processing Systems

Language inference and automata induction using recurrent neural networks has gained considerable interest in the recent years. Nevertheless, success of these models has been mostly limited to regular languages. Additional information in form of a priori knowledge has proved important and at times necessary for learning complex languages (Abu-Mostafa 1990; AI-Mashouq and Reed, 1991; Omlin and Giles, 1992; Towell, 1990). They have demonstrated that partial information incorporated in a connectionist model guides the learning process through constraints for efficient learning and better generalization. 'Ve have previously shown that the NNPDA model can learn Deterministic Context 65 66 Das, Giles, and Sun


Optimal Depth Neural Networks for Multiplication and Related Problems

Neural Information Processing Systems

An artificial neural network (ANN) is commonly modeled by a threshold circuit, a network of interconnected processing units called linear threshold gates. The depth of a network represents the number of unit delays or the time for parallel computation. The SIze of a circuit is the number of gates and measures the amount of hardware. It was known that traditional logic circuits consisting of only unbounded fan-in AND, OR, NOT gates would require at least O(log n/log log n) depth to compute common arithmetic functions such as the product or the quotient of two n-bit numbers, unless we allow the size (and fan-in) to increase exponentially (in n). We show in this paper that ANNs can be much more powerful than traditional logic circuits. In particular, we prove that that iterated addition can be computed by depth-2 ANN, and multiplication and division can be computed by depth-3 ANNs with polynomial size and polynomially bounded integer weights, respectively. Moreover, it follows from known lower bound results that these ANNs are optimal in depth. We also indicate that these techniques can be applied to construct polynomial-size depth-3 ANN for powering, and depth-4 ANN for mUltiple product.


Efficient Pattern Recognition Using a New Transformation Distance

Neural Information Processing Systems

Memory-based classification algorithms such as radial basis functions or K-nearest neighbors typically rely on simple distances (Euclidean, dot product...), which are not particularly meaningful on pattern vectors. More complex, better suited distance measures are often expensive and rather ad-hoc (elastic matching, deformable templates). We propose a new distance measure which (a) can be made locally invariant to any set of transformations of the input and (b) can be computed efficiently. We tested the method on large handwritten character databases provided by the Post Office and the NIST. Using invariances with respect to translation, rotation, scaling, shearing and line thickness, the method consistently outperformed all other systems tested on the same databases.


Improving Performance in Neural Networks Using a Boosting Algorithm

Neural Information Processing Systems

A boosting algorithm converts a learning machine with error rate less than 50% to one with an arbitrarily low error rate. However, the algorithm discussed here depends on having a large supply of independent training samples. We show how to circumvent this problem and generate an ensemble of learning machines whose performance in optical character recognition problems is dramatically improved over that of a single network. We report the effect of boosting on four databases (all handwritten) consisting of 12,000 digits from segmented ZIP codes from the United State Postal Service (USPS) and the following from the National Institute of Standards and Testing (NIST): 220,000 digits, 45,000 upper case alphas, and 45,000 lower case alphas. We use two performance measures: the raw error rate (no rejects) and the reject rate required to achieve a 1% error rate on the patterns not rejected.


Holographic Recurrent Networks

Neural Information Processing Systems

Holographic Recurrent Networks (HRNs) are recurrent networks which incorporate associative memory techniques for storing sequential structure. HRNs can be easily and quickly trained using gradient descent techniques to generate sequences of discrete outputs and trajectories through continuous spaee. The performance of HRNs is found to be superior to that of ordinary recurrent networks on these sequence generation tasks.


Intersecting regions: The Key to combinatorial structure in hidden unit space

Neural Information Processing Systems

Hidden units in multi-layer networks form a representation space in which each region can be identified with a class of equivalent outputs (Elman, 1989) or a logical state in a finite state machine (Cleeremans, Servan-Schreiber & McClelland, 1989; Giles, Sun, Chen, Lee, & Chen, 1990). We extend the analysis of the spatial structure of hidden unit space to a combinatorial task, based on binding features together in a visual scene. The logical structure requires a combinatorial number of states to represent all valid scenes. On analysing our networks, we find that the high dimensionality of hidden unit space is exploited by using the intersection of neighboring regions to represent conjunctions of features. These results show how combinatorial structure can be based on the spatial nature of networks, and not just on their emulation of logical structure.


Computing with Almost Optimal Size Neural Networks

Neural Information Processing Systems

Artificial neural networks are comprised of an interconnected collection of certain nonlinear devices; examples of commonly used devices include linear threshold elements, sigmoidal elements and radial-basis elements. We employ results from harmonic analysis and the theory of rational approximation to obtain almost tight lower bounds on the size (i.e.


Single-Iteration Threshold Hamming Networks

Neural Information Processing Systems

The HN calculates the Hamming distance between the input pattern and each memory pattern, and selects the memory with the smallest distance. It is composed of two subnets: The similarity subnet, consisting of an n-neuron input layer connected with an m-neuron memory layer, calculates the number of equal bits between the input and each memory pattern. The winner-take-all (WTA) subnet, consisting of a fully connected m-neuron topology, selects the memory neuron that best matches the input pattern.


A Boundary Hunting Radial Basis Function Classifier which Allocates Centers Constructively

Neural Information Processing Systems

A new boundary hunting radial basis function (BH-RBF) classifier which allocates RBF centers constructively near class boundaries is described. This classifier creates complex decision boundaries only in regions where confusions occur and corresponding RBF outputs are similar. A predicted square error measure is used to determine how many centers to add and to determine when to stop adding centers. Two experiments are presented which demonstrate the advantages of the BH RBF classifier. One uses artificial data with two classes and two input features where each class contains four clusters but only one cluster is near a decision region boundary.