AITopics | equivalent kernel

In this paper, we tackle the Bayesian estimation of point process intensity as a function of covariates. We propose a novel augmentation of permanental process called augmented permanental process, a doubly-stochastic point process that uses a Gaussian process on covariate space to describe the Bayesian a pri-ori uncertainty present in the square root of intensity, and derive a fast Bayesian estimation algorithm that scales linearly with data size without relying on either domain discretization or Markov Chain Monte Carlo computation. The proposed algorithm is based on a non-trivial finding that the representer theorem, one of the most desirable mathematical property for machine learning problems, holds for the augmented permanental process, which provides us with many significant computational advantages. We evaluate our algorithm on synthetic and real-world data, and show that it outperforms state-of-the-art methods in terms of predictive accuracy while being substantially faster than a conventional Bayesian method.

artificial intelligence, bayesian inference, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
Europe > Spain > Castilla-La Mancha (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Tensor Switching Networks, Andrew Saxe

Neural Information Processing SystemsMar-12-2024, 16:28:51 GMT

We present a novel neural network algorithm, the Tensor Switching (TS) network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to tensor-valued hidden units. The TS network copies its entire input vector to different locations in an expanded representation, with the location determined by its hidden unit activity. In this way, even a simple linear readout from the TS representation can implement a highly expressive deep-network-like function. The TS network hence avoids the vanishing gradient problem by construction, at the cost of larger representation size. We develop several methods to train the TS network, including equivalent kernels for infinitely wide and deep TS networks, a one-pass linear learning algorithm, and two backpropagation-inspired representation learning algorithms. Our experimental results demonstrate that the TS network is indeed more expressive and consistently learns faster than standard ReLU networks.

kernel, representation, ts network, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Estimating Equivalent Kernels for Neural Networks: A Data Perturbation Approach

Neural Information Processing SystemsApr-6-2023, 18:13:42 GMT

We describe the notion of "equivalent kernels" and suggest that this provides a framework for comparing different classes of regression models, including neural networks and both parametric and non-parametric statistical techniques. Unfortunately, standard techniques break down when faced with models, such as neural networks, in which there is more than one "layer" of adjustable parameters. We propose an algorithm which overcomes this limitation, estimating the equivalent kernels for neural network models using a data perturbation approach. Experimental results indicate that the networks do not use the maximum possible number of degrees of freedom, that these can be controlled using regularisation techniques and that the equivalent kernels learnt by the network vary both in "size" and in "shape" in different regions of the input space.

data perturbation approach, equivalent kernel, neural network

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Using the Equivalent Kernel to Understand Gaussian Process Regression

Neural Information Processing SystemsApr-6-2023, 15:51:45 GMT

The equivalent kernel [1] is a way of understanding how Gaussian pro- cess regression works for large sample sizes based on a continuum limit. In this paper we show (1) how to approximate the equivalent kernel of the widely-used squared exponential (or Gaussian) kernel and related ker- nels, and (2) how analysis using the equivalent kernel helps to understand the learning curves for Gaussian processes. Consider the supervised regression problem for a dataset D with entries (xi, yi) for i 1, . . . We can define a vector of functions h(x) (K 2I)-1k(x) . Thus we have f (x) h (x)y, making it clear that the mean prediction at a point x is a linear combination of the target values y.

approximation, kernel, regression, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.90)
Information Technology > Modeling & Simulation (0.62)

Add feedback

Invariance of Weight Distributions in Rectified MLPs

Tsuchida, Russell, Roosta-Khorasani, Farbod, Gallagher, Marcus

arXiv.org Machine LearningFeb-2-2018

An interesting approach to analyzing and developing tools for neural networks that has received renewed attention is to examine the equivalent kernel of the neural network. This is based on the fact that a fully connected feedforward network with one hidden layer, a certain weight distribution, an activation function, and an infinite number of neurons is a mapping that can be viewed as a projection into a Hilbert space. We show that the equivalent kernel of an MLP with ReLU or Leaky ReLU activations for all rotationally-invariant weight distributions is the same, generalizing a previous result that required Gaussian weight distributions. We derive the equivalent kernel for these cases. In deep networks, the equivalent kernel approaches a pathological fixed point, which can be used to argue why training randomly initialized networks can be difficult. Our results also have implications for weight initialization and the level sets in neural network cost functions.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1711.0909

Country:

North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > Queensland > Brisbane (0.04)
North America > United States (0.04)
(3 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Tensor Switching Networks

Tsai, Chuan-Yung, Saxe, Andrew M., Saxe, Andrew M., Cox, David

Neural Information Processing SystemsDec-31-2016

We present a novel neural network algorithm, the Tensor Switching (TS) network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to tensor-valued hidden units. The TS network copies its entire input vector to different locations in an expanded representation, with the location determined by its hidden unit activity. In this way, even a simple linear readout from the TS representation can implement a highly expressive deep-network-like function. The TS network hence avoids the vanishing gradient problem by construction, at the cost of larger representation size. We develop several methods to train the TS network, including equivalent kernels for infinitely wide and deep TS networks, a one-pass linear learning algorithm, and two backpropagation-inspired representation learning algorithms. Our experimental results demonstrate that the TS network is indeed more expressive and consistently learns faster than standard ReLU networks.

kernel, representation, ts network, (17 more...)

Neural Information Processing Systems

Country: