AITopics

Genre: Summary/Review (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.61)

Neural Information Processing SystemsOct-7-2024, 08:03:22 GMT

Reviews: Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming

The paper descries Lantern, a framework for automatic differentiation in Scala, based on callbacks and continuation passing style. It compares against PyTorch and TensorFlow on several benchmark tasks. There are two main aspects of the paper: Reverse-mode automatic differentiation with continuations, and code generation via multi-stage programming. The submission does not provide code for the proposed framework, which I don't find acceptable for a paper on a software package. It's unclear to me how the first is different from any other implementation of automatic differentiation via operator overloading.

automatic differentiation, efficient and expressive differentiable programming, pytorch and tensorflow, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.40)

Neural Information Processing SystemsOct-7-2024, 06:20:43 GMT

Reviews: Dendritic cortical microcircuits approximate the backpropagation algorithm

Using two compartments allows errors and activities to be represented within the same neuron. The overall procedure is similar to contrastive Hebbian learning and relies on weak top down feedback from an initial'self-predicting' settled state, but unlike contrastive Hebbian learning does not require separate phases. Experimental results show that the method can attain reasonable results on MNIST. Major comments: This paper presents an interesting approach to approximately implementing backpropagation that relies on a mixture of dendritic compartments and specific circuitry motifs. This is a fundamentally important topic and the results would likely be of interest to many, even if the specific hypothesis turns out to be incorrect.

backpropagation algorithm, cortical microcircuit approximate, dendritic cortical microcircuit, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.67)

Neural Information Processing SystemsOct-4-2024, 11:06:30 GMT

The Reversible Residual Network: Backpropagation Without Storing Activations

Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse

Deep residual networks (ResNets) have significantly pushed forward the state-ofthe-art on image classification, increasing in performance as networks grow both deeper and wider. However, memory consumption becomes a bottleneck, as one needs to store the activations in order to calculate gradients using backpropagation. We present the Reversible Residual Network (RevNet), a variant of ResNets where each layer's activations can be reconstructed exactly from the next layer's. Therefore, the activations for most layers need not be stored in memory during backpropagation. We demonstrate the effectiveness of RevNets on CIFAR-10, CIFAR-100, and ImageNet, establishing nearly identical classification accuracy to equally-sized ResNets, even though the activation storage requirements are independent of depth.

activation, architecture, arxiv preprint arxiv, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.83)

Krzysztof M. Choromanski, Vikas Sindhwani

On Blackbox Backpropagation and Jacobian Sensing

Neural Information Processing SystemsOct-4-2024, 00:07:00 GMT

From a small number of calls to a given "blackbox" on random input perturbations, we show how to efficiently recover its unknown Jacobian, or estimate the left action of its Jacobian on a given vector. Our methods are based on a novel combination of compressed sensing and graph coloring techniques, and provably exploit structural prior knowledge about the Jacobian such as sparsity and symmetry while being noise robust. We demonstrate efficient backpropagation through noisy blackbox layers in a deep neural net, improved data-efficiency in the task of linearizing the dynamics of a rigid body system, and the generic ability to handle a rich class of input-output dependency structures in Jacobian estimation problems.

graph, jacobian, matrix, (17 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceOct-4-2024

FastLRNR and Sparse Physics Informed Backpropagation

Cho, Woojin, Lee, Kookjin, Park, Noseong, Rim, Donsub, Welper, Gerrit

We introduce Sparse Physics Informed Backpropagation (SPInProp), a new class of methods for accelerating backpropagation for a specialized neural network architecture called Low Rank Neural Representation (LRNR). The approach exploits the low rank structure within LRNR and constructs a reduced neural network approximation that is much smaller in size. We call the smaller network FastLRNR. We show that backpropagation of FastLRNR can be substituted for that of LRNR, enabling a significant reduction in complexity. We apply SPInProp to a physics informed neural networks framework and demonstrate how the solution of parametrized partial differential equations is accelerated.

artificial intelligence, deep learning, machine learning, (11 more...)

2410.04001

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (1.00)

D'Amico, Francesco, Negri, Matteo

Self-attention as an attractor network: transient memories without backpropagation

arXiv.org Artificial IntelligenceSep-24-2024

Transformers are one of the most successful architectures of modern neural networks. At their core there is the so-called attention mechanism, which recently interested the physics community as it can be written as the derivative of an energy function in certain cases: while it is possible to write the cross-attention layer as a modern Hopfield network, the same is not possible for the self-attention, which is used in the GPT architectures and other autoregressive models. In this work we show that it is possible to obtain the self-attention layer as the derivative of local energy terms, which resemble a pseudo-likelihood. We leverage the analogy with pseudo-likelihood to design a recurrent model that can be trained without backpropagation: the dynamics shows transient states that are strongly correlated with both train and test examples. Overall we present a novel framework to interpret self-attention as an attractor network, potentially paving the way for new theoretical approaches inspired from physics to understand transformers.

bare self-attention, transformer, transformer block, (15 more...)

2409.16112

Country: Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.61)

Barley, Daniel, Fröning, Holger

Less Memory Means smaller GPUs: Backpropagation with Compressed Activations

arXiv.org Artificial IntelligenceSep-18-2024

The ever-growing scale of deep neural networks (DNNs) has lead to an equally rapid growth in computational resource requirements. Many recent architectures, most prominently Large Language Models, have to be trained using supercomputers with thousands of accelerators, such as GPUs or TPUs. Next to the vast number of floating point operations the memory footprint of DNNs is also exploding. In contrast, GPU architectures are notoriously short on memory. Even comparatively small architectures like some EfficientNet variants cannot be trained on a single consumer-grade GPU at reasonable mini-batch sizes. During training, intermediate input activations have to be stored until backpropagation for gradient calculation. These make up the vast majority of the memory footprint. In this work we therefore consider compressing activation maps for the backward pass using pooling, which can reduce both the memory footprint and amount of data movement. The forward computation remains uncompressed. We empirically show convergence and study effects on feature detection at the example of the common vision architecture ResNet. With this approach we are able to reduce the peak memory consumption by 29% at the cost of a longer training schedule, while maintaining prediction accuracy compared to an uncompressed baseline.

artificial intelligence, deep learning, machine learning, (18 more...)

2409.11902

Country: Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.63)

Rüb, Marcus, Sikora, Axel, Mueller-Gritschneder, Daniel

Advancing On-Device Neural Network Training with TinyPropv2: Dynamic, Sparse, and Efficient Backpropagation

arXiv.org Artificial IntelligenceSep-11-2024

This study introduces TinyPropv2, an innovative algorithm optimized for on-device learning in deep neural networks, specifically designed for low-power microcontroller units. TinyPropv2 refines sparse backpropagation by dynamically adjusting the level of sparsity, including the ability to selectively skip training steps. This feature significantly lowers computational effort without substantially compromising accuracy. Our comprehensive evaluation across diverse datasets CIFAR 10, CIFAR100, Flower, Food, Speech Command, MNIST, HAR, and DCASE2020 reveals that TinyPropv2 achieves near-parity with full training methods, with an average accuracy drop of only around 1 percent in most cases. For instance, against full training, TinyPropv2's accuracy drop is minimal, for example, only 0.82 percent on CIFAR 10 and 1.07 percent on CIFAR100. In terms of computational effort, TinyPropv2 shows a marked reduction, requiring as little as 10 percent of the computational effort needed for full training in some scenarios, and consistently outperforms other sparse training methodologies. These findings underscore TinyPropv2's capacity to efficiently manage computational resources while maintaining high accuracy, positioning it as an advantageous solution for advanced embedded device applications in the IoT ecosystem.

efficient backpropagation, on-device neural network training, tinypropv2, (1 more...)

doi: 10.1109/IJCNN60899.2024.10650122

2409.07109

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.60)

arXiv.org Artificial IntelligenceAug-18-2024

Rethinking Deep Learning: Propagating Information in Neural Networks without Backpropagation and Statistical Optimization

Itoh, Kei

Developing strong AI signifies the arrival of technological singularity, contributing greatly to advancing human civilization and resolving social issues. Neural networks (NNs) and deep learning, which utilize NNs, are expected to lead to strong AI due to their biological neural system-mimicking structures. However, the statistical weight optimization techniques commonly used, such as error backpropagation and loss functions, may hinder the mimicry of neural systems. This study discusses the information propagation capabilities and potential practical applications of NNs as neural system mimicking structures by solving the handwritten character recognition problem in the Modified National Institute of Standards and Technology (MNIST) database without using statistical weight optimization techniques like error backpropagation. In this study, the NNs architecture comprises fully connected layers using step functions as activation functions, with 0-15 hidden layers, and no weight updates. The accuracy is calculated by comparing the average output vectors of the training data for each label with the output vectors of the test data, based on vector similarity. The results showed that the maximum accuracy achieved is around 80%. This indicates that NNs can propagate information correctly without using statistical weight optimization. Additionally, the accuracy decreased with an increasing number of hidden layers. This is attributed to the decrease in the variance of the output vectors as the number of hidden layers increases, suggesting that the output data becomes smooth. This study's NNs and accuracy calculation methods are simple and have room for various improvements. Moreover, creating a feedforward NNs that repeatedly cycles through 'input -> processing -> output -> environmental response -> input -> ...' could pave the way for practical software applications.

information, output vector, vector, (12 more...)

2409.0376

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report > New Finding (0.90)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.83)