AITopics | Backpropagation

Collaborating Authors

Backpropagation

News Overviews Instructional Materials AI-Alerts Classics

4D-Var using Hessian approximation and backpropagation applied to automatically-differentiable numerical and machine learning models

Solvik, Kylen, Penny, Stephen G., Hoyer, Stephan

arXiv.org Artificial IntelligenceAug-5-2024

Constraining a numerical weather prediction (NWP) model with observations via 4D variational (4D-Var) data assimilation is often difficult to implement in practice due to the need to develop and maintain a software-based tangent linear model and adjoint model. One of the most common 4D-Var algorithms uses an incremental update procedure, which has been shown to be an approximation of the Gauss-Newton method. Here we demonstrate that when using a forecast model that supports automatic differentiation, an efficient and in some cases more accurate alternative approximation of the Gauss-Newton method can be applied by combining backpropagation of errors with Hessian approximation. This approach can be used with either a conventional numerical model implemented within a software framework that supports automatic differentiation, or a machine learning (ML) based surrogate model. We test the new approach on a variety of Lorenz-96 and quasi-geostrophic models. The results indicate potential for a deeper integration of modeling, data assimilation, and new technologies in a next-generation of operational forecast systems that leverage weather models designed to support automatic differentiation.

backprop-4dvar, standard deviation, time step, (14 more...)

arXiv.org Artificial Intelligence

2408.02767

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.61)

Add feedback

Comprehensive Survey of Complex-Valued Neural Networks: Insights into Backpropagation and Activation Functions

Hammad, M. M.

arXiv.org Artificial IntelligenceJul-27-2024

Artificial neural networks (ANNs), particularly those employing deep learning models, have found widespread application in fields such as computer vision, signal processing, and wireless communications, where complex numbers are crucial. Despite the prevailing use of real-number implementations in current ANN frameworks, there is a growing interest in developing ANNs that utilize complex numbers. This paper presents a comprehensive survey of recent advancements in complex-valued neural networks (CVNNs), focusing on their activation functions (AFs) and learning algorithms. We delve into the extension of the backpropagation algorithm to the complex domain, which enables the training of neural networks with complex-valued inputs, weights, AFs, and outputs. This survey considers three complex backpropagation algorithms: the complex derivative approach, the partial derivatives approach, and algorithms incorporating the Cauchy-Riemann equations. A significant challenge in CVNN design is the identification of suitable nonlinear Complex Valued Activation Functions (CVAFs), due to the conflict between boundedness and differentiability over the entire complex plane as stated by Liouville's theorem. We examine both fully complex AFs, which strive for boundedness and differentiability, and split AFs, which offer a practical compromise despite not preserving analyticity. This review provides an in-depth analysis of various CVAFs essential for constructing effective CVNNs. Moreover, this survey not only offers a comprehensive overview of the current state of CVNNs but also contributes to ongoing research and development by introducing a new set of CVAFs (fully complex, split and complex amplitude-phase AFs).

complex number, imaginary part, magnitude, (14 more...)

arXiv.org Artificial Intelligence

2407.19258

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Africa > Middle East > Egypt (0.04)

Genre: Overview (1.00)

Industry:

Energy (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.81)

Add feedback

Tool Shape Optimization through Backpropagation of Neural Network

Kawaharazuka, Kento, Ogawa, Toru, Nabeshima, Cota

arXiv.org Artificial IntelligenceJul-16-2024

When executing a certain task, human beings can choose or make an appropriate tool to achieve the task. This research especially addresses the optimization of tool shape for robotic tool-use. We propose a method in which a robot obtains an optimized tool shape, tool trajectory, or both, depending on a given task. The feature of our method is that a transition of the task state when the robot moves a certain tool along a certain trajectory is represented by a deep neural network. We applied this method to object manipulation tasks on a 2D plane, and verified that appropriate tool shapes are generated by using this novel method.

optimization, tool shape, trajectory, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IROS45743.2020.9341583

2407.12202

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation

Yang, Yuchen, Shi, Yingdong, Wang, Cheems, Zhen, Xiantong, Shi, Yuxuan, Xu, Jun

arXiv.org Artificial IntelligenceJun-23-2024

Fine-tuning pretrained large models to downstream tasks is an important problem, which however suffers from huge memory overhead due to large-scale parameters. This work strives to reduce memory overhead in fine-tuning from perspectives of activation function and layer normalization. To this end, we propose the Approximate Backpropagation (Approx-BP) theory, which provides the theoretical feasibility of decoupling the forward and backward passes. We apply our Approx-BP theory to backpropagation training and derive memory-efficient alternatives of GELU and SiLU activation functions, which use derivative functions of ReLUs in the backward pass while keeping their forward pass unchanged. In addition, we introduce a Memory-Sharing Backpropagation strategy, which enables the activation memory to be shared by two adjacent layers, thereby removing activation memory usage redundancy. Our method neither induces extra computation nor reduces training efficiency. We conduct extensive experiments with pretrained vision and language models, and the results demonstrate that our proposal can reduce up to $\sim$$30\%$ of the peak memory usage. Our code is released at https://github.com/yyyyychen/LowMemoryBP.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2406.16282

Country:

North America > United States (0.28)
Asia > China (0.28)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Energy > Oil & Gas (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

HLQ: Fast and Efficient Backpropagation via Hadamard Low-rank Quantization

Kim, Seonggon, Park, Eunhyeok

arXiv.org Artificial IntelligenceJun-21-2024

With the rapid increase in model size and the growing importance of various fine-tuning applications, lightweight training has become crucial. Since the backward pass is twice as expensive as the forward pass, optimizing backpropagation is particularly important. However, modifications to this process can lead to suboptimal convergence, so training optimization should minimize perturbations, which is a highly challenging task. In this study, we introduce a novel optimization strategy called Hadamard Low-rank Quantization (HLQ), focusing on reducing the cost of backpropagation in convolutional and linear layers. We first analyze the sensitivity of gradient computation with respect to activation and weight, and judiciously design the HLQ pipeline to apply 4-bit Hadamard quantization to the activation gradient and Hadamard low-rank approximation to the weight gradient. This combination was found to be the best for maximizing benefits, and our extensive experiments demonstrate the outstanding performance of HLQ in both training from scratch and fine-tuning, achieving significant memory savings and acceleration on real GPUs with negligible quality degradation.

artificial intelligence, machine learning, quantization, (16 more...)

arXiv.org Artificial Intelligence

2406.15102

Genre: Research Report > New Finding (0.34)

Industry: Energy > Oil & Gas (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Attention-aware Post-training Quantization without Backpropagation

Kim, Junhan, Kim, Ho-young, Cho, Eulrang, Lee, Chungman, Kim, Joonyoung, Jeon, Yongkweon

arXiv.org Artificial IntelligenceJun-19-2024

Quantization is a promising solution for deploying large-scale language models (LLMs) on resource-constrained devices. Existing quantization approaches, however, rely on gradient-based optimization, regardless of it being post-training quantization (PTQ) or quantization-aware training (QAT), which becomes problematic for hyper-scale LLMs with billions of parameters. This overhead can be alleviated via recently proposed backpropagation-free PTQ methods; however, their performance is somewhat limited by their lack of consideration of inter-layer dependencies. In this paper, we thus propose a novel PTQ algorithm that considers inter-layer dependencies without relying on backpropagation. The fundamental concept involved is the development of attention-aware Hessian matrices, which facilitates the consideration of inter-layer dependencies within the attention module. Extensive experiments demonstrate that the proposed algorithm significantly outperforms conventional PTQ methods, particularly for low bit-widths.

hessian, optq, quantization, (16 more...)

arXiv.org Artificial Intelligence

2406.13474

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.81)

Add feedback

Grad-Instructor: Universal Backpropagation with Explainable Evaluation Neural Networks for Meta-learning and AutoML

Ino, Ryohei

arXiv.org Artificial IntelligenceJun-15-2024

This paper presents a novel method for autonomously enhancing deep neural network training. My approach employs an Evaluation Neural Network (ENN) trained via deep reinforcement learning to predict the performance of the target network. The ENN then works as an additional evaluation function during backpropagation. Computational experiments with Multi-Layer Perceptrons (MLPs) demonstrate the method's effectiveness. By processing input data at 0.15^2 times its original resolution, the ENNs facilitated efficient inference. Results indicate that MLPs trained with the proposed method achieved a mean test accuracy of 93.02%, which is 2.8% higher than those trained solely with conventional backpropagation or with L1 regularization. The proposed method's test accuracy is comparable to networks initialized with He initialization while reducing the difference between test and training errors. These improvements are achieved without increasing the number of epochs, thus avoiding the risk of overfitting. Additionally, the proposed method dynamically adjusts gradient magnitudes according to the training stage. The optimal ENN for enhancing MLPs can be predicted, reducing the time spent exploring optimal training methodologies. The explainability of ENNs is also analyzed using Grad-CAM, demonstrating their ability to visualize evaluation bases and supporting the Strong Lottery Ticket hypothesis.

mlp, neural network, test accuracy, (13 more...)

arXiv.org Artificial Intelligence

2406.10559

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.69)

Add feedback

COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization

Zhang, Aozhong, Yang, Zi, Wang, Naigang, Qin, Yingyong, Xin, Jack, Li, Xin, Yin, Penghang

arXiv.org Artificial IntelligenceJun-4-2024

Post-training quantization (PTQ) has emerged as a practical approach to compress large neural networks, making them highly efficient for deployment. However, effectively reducing these models to their low-bit counterparts without compromising the original accuracy remains a key challenge. In this paper, we propose an innovative PTQ algorithm termed COMQ, which sequentially conducts coordinate-wise minimization of the layer-wise reconstruction errors. We consider the widely used integer quantization, where every quantized weight can be decomposed into a shared floating-point scalar and an integer bit-code. Within a fixed layer, COMQ treats all the scaling factor(s) and bit-codes as the variables of the reconstruction error. Every iteration improves this error along a single coordinate while keeping all other variables constant. COMQ is easy to use and requires no hyper-parameter tuning. It instead involves only dot products and rounding operations. We update these variables in a carefully designed greedy order, significantly enhancing the accuracy. COMQ achieves remarkable results in quantizing 4-bit Vision Transformers, with a negligible loss of less than 1% in Top-1 accuracy. In 4-bit INT quantization of convolutional neural networks, COMQ maintains near-lossless accuracy with a minimal drop of merely 0.3% in Top-1 accuracy.

comq, neural network, quantization, (13 more...)

arXiv.org Artificial Intelligence

2403.07134

Country: North America > United States > California (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.42)

Add feedback

FFCL: Forward-Forward Net with Cortical Loops, Training and Inference on Edge Without Backpropagation

Karkehabadi, Ali, Homayoun, Houman, Sasan, Avesta

arXiv.org Artificial IntelligenceMay-20-2024

The Forward-Forward Learning (FFL) algorithm is a recently proposed solution for training neural networks without needing memory-intensive backpropagation. During training, labels accompany input data, classifying them as positive or negative inputs. Each layer learns its response to these inputs independently. In this study, we enhance the FFL with the following contributions: 1) We optimize label processing by segregating label and feature forwarding between layers, enhancing learning performance. 2) By revising label integration, we enhance the inference process, reduce computational complexity, and improve performance. 3) We introduce feedback loops akin to cortical loops in the brain, where information cycles through and returns to earlier neurons, enabling layers to combine complex features from previous layers with lower-level features, enhancing learning efficiency.

accuracy, backpropagation, complexity, (17 more...)

arXiv.org Artificial Intelligence

2405.12443

Country:

North America > United States > California > Yolo County > Davis (0.14)
North America > United States > Florida > Hillsborough County > Tampa (0.04)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.64)

Add feedback

Stabilizing Backpropagation Through Time to Learn Complex Physics

Schnell, Patrick, Thuerey, Nils

arXiv.org Artificial IntelligenceMay-3-2024

Of all the vector fields surrounding the minima of recurrent learning setups, the gradient field with its exploding and vanishing updates appears a poor choice for optimization, offering little beyond efficient computability. We seek to improve this suboptimal practice in the context of physics simulations, where backpropagating feedback through many unrolled time steps is considered crucial to acquiring temporally coherent behavior. The alternative vector field we propose follows from two principles: physics simulators, unlike neural networks, have a balanced gradient flow, and certain modifications to the backpropagation pass leave the positions of the original minima unchanged. As any modification of backpropagation decouples forward and backward pass, the rotation-free character of the gradient field is lost. Therefore, we discuss the negative implications of using such a rotational vector field for optimization and how to counteract them. Our final procedure is easily implementable via a sequence of gradient stopping and component-wise comparison operations, which do not negatively affect scalability. Our experiments on three control problems show that especially as we increase the complexity of each task, the unbalanced updates from the gradient can no longer provide the precise control signals necessary while our method still solves the tasks. Our code can be found at https://github.com/tum-pbs/StableBPTT.

conference paper, gradient, vector field, (16 more...)

arXiv.org Artificial Intelligence

2405.02041

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.82)

Add feedback