AITopics | forward propagation

Training Transformers with 4-bit Integers

Neural Information Processing SystemsApr-29-2026, 03:27:06 GMT

Quantizing the activation, weight, and gradient to 4-bit is promising to accelerate neural network training. However, existing 4-bit training methods require custom numerical formats which are not supported by contemporary hardware. In this work, we propose a training method for transformers with all matrix multiplications implemented with the INT4 arithmetic. Training with an ultra-low INT4 precision is challenging. To achieve this, we carefully analyze the specific structures of activation and gradients in transformers to propose dedicated quantizers for them. For forward propagation, we identify the challenge of outliers and propose a Hadamard quantizer to suppress the outliers.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Neural Information Processing SystemsApr-25-2026, 04:01:41 GMT

Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative normalization layers, these properties need to be generalized so that any given layer's success/failure can be accurately predicted. In this work, we take a first step towards this goal by extending known properties of BatchNorm in randomly initialized deep neural networks (DNNs) to several recently proposed normalization layers. Our primary findings follow: (i) similar to BatchNorm, activations-based normalization layers can prevent exponential growth of activations in ResNets, but parametric techniques require explicit remedies; (ii) use of GroupNorm can ensure an informative forward propagation, with different samples being assigned dissimilar activations, but increasing group size results in increasingly indistinguishable activations for different samples, explaining slow convergence speed in models with LayerNorm; and (iii) small group sizes result in large gradient norm in earlier layers, hence explaining training instability issues in Instance Normalization and illustrating a speed-stability tradeoff in GroupNorm. Overall, our analysis reveals a unified set of mechanisms that underpin the success of normalization methods in deep learning, providing us with a compass to systematically explore the vast design space of DNN normalization layers.

artificial intelligence, batchnorm, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

1f9f9d8ff75205aa73ec83e543d8b571-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 01:03:06 GMT

We repeat the theorems presented in Sec. 3 and provide their proofs below. The theorems hold for Neumann boundary conditions, which we use in our implementation--this is achieved by the construction of the differential operators. The proofs follow the ones presented in [22]. If the activation function σ() is monotonically non-decreasing and sign-preserving, then the forward propagation through the diffusive PDE in (1) for t [0,) yields a non-increasing feature norm, that is, t kfk2 0. Proof. Let us examine the following inner product following Eq.

architecture, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

80f2f15983422987ea30d77bb531be86-Supplemental.pdf

Neural Information Processing SystemsFeb-19-2026, 05:09:22 GMT

backward propagation, forward propagation, propagation, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

99fc8bc48b917c301a80cb74d91c0c06-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 02:17:30 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

edea298442a67de045e88dfb6e5ea4a2-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 19:16:44 GMT

flop, forward propagation, learning quantized weight, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.99)

Add feedback

BBoE: Leveraging Bundle of Edges for Kinodynamic Bidirectional Motion Planning

Raghu, Srikrishna Bangalore, Roncone, Alessandro

arXiv.org Artificial IntelligenceSep-25-2025

Abstract-- In this work, we introduce BBoE, a bidirectional, kinodynamic, sampling-based motion planner that consistently and quickly finds low-cost solutions in environments with varying obstacle clutter . The algorithm combines exploration and exploitation while relying on precomputed robot state traversals, resulting in efficient convergence towards the goal. Our key contributions include: i) a strategy to navigate through obstacle-rich spaces by sorting and sequencing preprocessed forward propagations; and ii) BBoE, a robust bidirectional kinodynamic planner that utilizes this strategy to produce fast and feasible solutions. The proposed framework reduces planning time, diminishes solution cost and increases success rate in comparison to previous approaches. I. INTRODUCTION Motion planning in robotics involves identifying a series of valid configurations that a robot can assume to transition from an initial state to a desired goal state. Sampling-based planning is a popular graph-based approach used to generate robot motions by sampling discrete states and establishing connections between them via edges [23]. Their popularity is due to the inherent property of probabilistic completeness, which guarantees that a solution will be found, if one exists, as the number of sampled states reaches infinity [17], [10]. Traditionally, these techniques employ a unidirectional tree that grows from the start state and expands towards the goal region [17], [10], [6].

artificial intelligence, international conference, propagation, (15 more...)

arXiv.org Artificial Intelligence

2509.20333

Country:

Asia (0.28)
North America > United States > Colorado (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.87)

Add feedback

edea298442a67de045e88dfb6e5ea4a2-Supplemental.pdf

Neural Information Processing SystemsAug-18-2025, 15:42:39 GMT

artificial intelligence, flop, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Supplemental Material: Efficient Neural Network Training via Forward and Backward Propagation Sparsification

Neural Information Processing SystemsAug-15-2025, 12:02:05 GMT

This appendix can be divided into four parts. Section A gives the detailed proof of Theorem 1 and discuss the convergence of our method. Before giving the detailed proof, we would like to present the following two properties of overparam-eterized deep neural networks, which are implied by the latest studies based on the mean field theory. We will empirically verify these properties in this section and adopt them as assumptions in our proof. That's why Property 1 holds.

artificial intelligence, machine learning, propagation, (13 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback