AITopics | backwardpass

Collaborating Authors

backwardpass

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Does This Gradient Spark Joy?

Osband, Ian

arXiv.org Machine LearningMar-24-2026

Policy gradient computes a backward pass for every sample, even though the backward pass is expensive and most samples carry little learning value. The Delightful Policy Gradient (DG) provides a forward-pass signal of learning value: \emph{delight}, the product of advantage and surprisal (negative log-probability). We introduce the \emph{Kondo gate}, which compares delight against a compute price and pays for a backward pass only when the sample is worth it, thereby tracing a quality--cost Pareto frontier. In bandits, zero-price gating preserves useful gradient signal while removing perpendicular noise, and delight is a more reliable screening signal than additive combinations of value and surprise. On MNIST and transformer token reversal, the Kondo gate skips most backward passes while retaining nearly all of DG's learning quality, with gains that grow as problems get harder and backward passes become more expensive. Because the gate tolerates approximate delight, a cheap forward pass can screen samples before expensive backpropagation, suggesting a speculative-decoding-for-training paradigm.

artificial intelligence, backward pass, machine learning, (16 more...)

arXiv.org Machine Learning

2603.20526

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

a284df1155ec3e67286080500df36a9a-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 09:43:05 GMT

Recent approaches include priors on the feature attribution of a deep neural network (DNN) into the training process to reduce the dependence on unwanted features. However, until now one needed to trade off high-quality attributions, satisfying desirable axioms, against the time required to compute them. This in turn either led to long training times or ineffective attribution priors.

artificial intelligence, attribution, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Marche > Ancona Province > Ancona (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

MT2ST: Adaptive Multi-Task to Single-Task Learning

Liu, Dong, Jiang, Meng

arXiv.org Artificial IntelligenceJun-25-2024

The conventional training approaches often face challenges in balancing the breadth of multi-task learning (MTL) with the depth of single-task learning (STL). To address this issue, we introduce the Multi-Task to Single-Task (MT2ST) framework, a groundbreaking approach that can combine the generalizability of MTL with the precision of STL. Our work include two strategies: 'Diminish' and 'Switch'. 'Diminish' Strategy will gradually reduce the influence of auxiliary tasks, while the 'Switch' strategy involves a shift from multi-tasking to single-tasking at a specific timepoint at the training process. In this paper, we propose the Multi-Task to Single-Task (MT2ST) framework, a novel approach that significantly enhances the efficiency and accuracy of word embedding training while concurrently addressing prevalent issues such as overfitting. Our empirical studies demonstrate that MT2ST can reduce training time by 67\% when contrasted with single-task learning approaches, and by 13\% compared to traditional multi-task learning methods. These findings underscore MT2ST's potential to be a powerful tools for word embedding training acceleration.

learning, mt2st, multi-task learning, (13 more...)

arXiv.org Artificial Intelligence

2406.18038

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre:

Research Report > Promising Solution (0.54)
Overview > Innovation (0.54)
Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback