Goto

Collaborating Authors

 Flake, Gary William


Differentiating Functions of the Jacobian with Respect to the Weights

Neural Information Processing Systems

For many problems, the correct behavior of a model depends not only on its input-output mapping but also on properties of its Jacobian matrix, the matrix of partial derivatives of the model's outputs with respect to its inputs. We introduce the J-prop algorithm, an efficient general method for computing the exact partial derivatives of a variety of simple functions of the Jacobian of a model with respect to its free parameters. The algorithm applies to any parametrized feedforward model, including nonlinear regression, multilayer perceptrons, and radial basis function networks.


Differentiating Functions of the Jacobian with Respect to the Weights

Neural Information Processing Systems

For many problems, the correct behavior of a model depends not only on its input-output mapping but also on properties of its Jacobian matrix, the matrix of partial derivatives of the model's outputs with respect to its inputs.


The Capacity of a Bump

Neural Information Processing Systems

Recently, several researchers have reported encouraging experimental results when using Gaussian or bump-like activation functions in multilayer perceptrons. Networks of this type usually require fewer hidden layers and units and often learn much faster than typical sigmoidal networks. To explain these results we consider a hyper-ridge network, which is a simple perceptron with no hidden units and a rid¥e activation function. If we are interested in partitioningp points in d dimensions into two classes then in the limit as d approaches infinity the capacity of a hyper-ridge and a perceptron is identical.


The Capacity of a Bump

Neural Information Processing Systems

Recently, several researchers have reported encouraging experimental results whenusing Gaussian or bump-like activation functions in multilayer perceptrons. Networks of this type usually require fewer hidden layers and units and often learn much faster than typical sigmoidal networks. To explain these results we consider a hyper-ridge network, which is a simple perceptron with no hidden units and a rid¥e activation function. If we are interested in partitioningp points in d dimensions into two classes then in the limit as d approaches infinity the capacity of a hyper-ridge and a perceptron is identical.


The Capacity of a Bump

Neural Information Processing Systems

Recently, several researchers have reported encouraging experimental results when using Gaussian or bump-like activation functions in multilayer perceptrons. Networks of this type usually require fewer hidden layers and units and often learn much faster than typical sigmoidal networks. To explain these results we consider a hyper-ridge network, which is a simple perceptron with no hidden units and a rid¥e activation function. If we are interested in partitioningp points in d dimensions into two classes then in the limit as d approaches infinity the capacity of a hyper-ridge and a perceptron is identical.