AITopics | g-var

Collaborating Authors

g-var

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TensorProgramsI

Neural Information Processing SystemsFeb-12-2026, 08:12:35 GMT

Please see https://arxiv.org/abs/1910.12478 for the full version of this paper.

artificial intelligence, deep learning, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

5e69fda38cda2060819766569fd93aa5-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 20:07:22 GMT

artificial intelligence, g-var, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Forecasting Graph Signals with Recursive MIMO Graph Filters

van der Hoeven, Jelmer, Natali, Alberto, Leus, Geert

arXiv.org Artificial IntelligenceOct-27-2022

Forecasting time series on graphs is a fundamental problem in graph signal processing. When each entity of the network carries a vector of values for each time stamp instead of a scalar one, existing approaches resort to the use of product graphs to combine this multidimensional information, at the expense of creating a larger graph. In this paper, we show the limitations of such approaches, and propose extensions to tackle them. Then, we propose a recursive multiple-input multiple-output graph filter which encompasses many already existing models in the literature while being more flexible. Numerical simulations on a real world data set show the effectiveness of the proposed models.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.15258

Country:

Europe > Netherlands > South Holland > Delft (0.05)
Asia > China > Beijing > Beijing (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Communications > Social Media (0.94)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

The Recurrent Neural Tangent Kernel

Alemohammad, Sina, Wang, Zichao, Balestriero, Randall, Baraniuk, Richard

arXiv.org Machine LearningOct-2-2020

The study of deep networks (DNs) in the infinite-width limit, via the so-called Neural Tangent Kernel (NTK) approach, has provided new insights into the dynamics of learning, generalization, and the impact of initialization. One key DN architecture remains to be kernelized, namely, the Recurrent Neural Network (RNN). In this paper we introduce and study the Recurrent Neural Tangent Kernel (RNTK), which sheds new insights into the behavior of overparametrized RNNs, including how different time steps are weighted by the RNTK to form the output under different initialization parameters and nonlinearity choices, and how inputs of different lengths are treated. We demonstrate via a number of experiments that the RNTK offers significant performance gains over other kernels, including standard NTKs across a range of different data sets. A unique benefit of the RNTK is that it is agnostic to the length of the input, in stark contrast to other kernels.

artificial intelligence, machine learning, rntk, (16 more...)

arXiv.org Machine Learning

2006.10246

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Texas > Harris County > Houston (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tensor Programs II: Neural Tangent Kernel for Any Architecture

Yang, Greg

arXiv.org Machine LearningAug-18-2020

We prove that a randomly initialized neural network of *any architecture* has its Tangent Kernel (NTK) converge to a deterministic limit, as the network widths tend to infinity. We demonstrate how to calculate this limit. In prior literature, the heuristic study of neural network gradients often assumes every weight matrix used in forward propagation is independent from its transpose used in backpropagation (Schoenholz et al. 2017). This is known as the *gradient independence assumption (GIA)*. We identify a commonly satisfied condition, which we call *Simple GIA Check*, such that the NTK limit calculation based on GIA is correct. Conversely, when Simple GIA Check fails, we show GIA can result in wrong answers. Our material here presents the NTK results of Yang (2019a) in a friendly manner and showcases the *tensor programs* technique for understanding wide neural networks. We provide reference implementations of infinite-width NTKs of recurrent neural network, transformer, and batch normalization at https://github.com/thegregyang/NTK4A.

artificial intelligence, machine learning, theorem, (18 more...)

arXiv.org Machine Learning

2006.14548

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation

Yang, Greg

arXiv.org Machine LearningFeb-13-2019

Several recent trends in machine learning theory and practice, from the design of state-of-the-art Gaussian Process to the convergence analysis of deep neural nets (DNNs) under stochastic gradient descent (SGD), have found it fruitful to study wide random neural networks. Central to these approaches are certain scaling limits of such networks. We unify these results by introducing a notion of a straightline \emph{tensor program} that can express most neural network computations, and we characterize its scaling limit when its tensors are large and randomized. From our framework follows (1) the convergence of random neural networks to Gaussian processes for architectures such as recurrent neural networks, convolutional neural networks, residual networks, attention, and any combination thereof, with or without batch normalization; (2) conditions under which the \emph{gradient independence assumption} -- that weights in backpropagation can be assumed to be independent from weights in the forward pass -- leads to correct computation of gradient dynamics, and corrections when it does not; (3) the convergence of the Neural Tangent Kernel, a recently proposed kernel used to predict training dynamics of neural networks under gradient descent, at initialization for all architectures in (1) without batch normalization. Mathematically, our framework is general enough to rederive classical random matrix results such as the semicircle and the Marchenko-Pastur laws, as well as recent results in neural network Jacobian singular values. We hope our work opens a way toward design of even stronger Gaussian Processes, initialization schemes to avoid gradient explosion/vanishing, and deeper understanding of SGD dynamics in modern architectures.

computation, scaling limit, wide neural network, (11 more...)

arXiv.org Machine Learning

1902.0476

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback