AITopics | Africa

Collaborating Authors

Africa

Parallelizing Linear Transformers with the Delta Rule over Sequence Length Songlin Y ang Bailin Wang Y u Zhang Yikang Shen Y oon Kim Massachusetts Institute of Technology Soochow University

Neural Information Processing SystemsFeb-18-2026, 06:03:05 GMT

Transformers with linear attention (i.e., linear transfor mers) and state-space models have recently been suggested as a viable linear-time alt ernative to transformers with softmax attention. However, these models still underp erform transformers especially on tasks that require in-context retrieval. Whil e more expressive variants of linear transformers which replace the additive upda te in linear transformers with the delta rule [DeltaNet; 101 ] have been found to be more effective at associative recall, existing algorithms for training such mode ls do not parallelize over sequence length and are thus inefficient to train on modern ha rdware. This work describes a hardware-efficient algorithm for training line ar transformers with the delta rule, which exploits a memory-efficient representati on for computing products of Householder matrices [ 11 ]. This algorithm allows us to scale up DeltaNet to standard language modeling settings. We train a 1.3B mode l for 100B tokens and find that it outperforms recent linear-time baselines su ch as Mamba [ 31 ] and GLA [ 124 ] in terms of perplexity and zero-shot performance on downst ream tasks. We also experiment with two hybrid models which combine Delt aNet layers with (1) sliding-window attention layers every other layer or (2) two global attention layers, and find that these hybrids outperform strong transf ormer baselines.

arxiv preprint, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Rwanda > Kigali > Kigali (0.04)
North America > United States > Maryland > Baltimore (0.04)
(19 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.67)
Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Agent Planning with World Knowledge Model

Neural Information Processing SystemsFeb-18-2026, 05:42:37 GMT

Imitating humans' mental world knowledge model which provides global prior knowledge before the task and maintains local dynamic knowledge during the task, in this paper, we introduce parametric W orld K nowledge M odel ( WKM) to facilitate agent

knowledge management, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
Asia > Singapore (0.04)
North America > Canada (0.04)
(11 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Education > Educational Setting (0.67)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
(3 more...)

Add feedback

Supplementary Material and Datasheet: Off to new Shores: A Dataset & Benchmark for (near-)coastal Flood Inundation Forecasting Contents

Neural Information Processing SystemsFeb-18-2026, 05:42:18 GMT

This supplementary document follows the Datasheets for Datasets template of (8) to document the Global Flood Forecasting (GFF) dataset and its creation. Further resources are provided: in the accompanying publication https://arxiv.org/abs/2409.18591 in the GitHub repository https://github.com/Multihuntr/GFF

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Alaska (0.04)
North America > United States > Colorado (0.04)
Europe > Germany > Brandenburg > Potsdam (0.04)
Africa (0.04)

Genre: Research Report (0.66)

Industry:

Law (1.00)
Government (1.00)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Software (0.68)

Add feedback

d029726ea3e0b6050d0ec666099964cd-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 05:42:16 GMT

climate zone, dataset, forecasting, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Alaska (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Brandenburg > Potsdam (0.04)
(2 more...)

Genre: Research Report (0.68)

Industry:

Government (0.68)
Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

d00fcdd0629dabdf515b1e6425a261bb-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 05:41:47 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Uruguay (0.04)
Oceania > New Zealand (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government (1.00)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework

Neural Information Processing SystemsFeb-18-2026, 04:19:55 GMT

Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a combination of the data, rather than the raw data itself. The final result then is decoded from the collective outputs of the worker nodes. However, there is a significant gap between current coded computing approaches and the broader landscape of general distributed computing, particularly when it comes to machine learning workloads. To bridge this gap, we propose a novel foundation for coded computing, integrating the principles of learning theory, and developing a framework that seamlessly adapts with machine learning applications. In this framework, the objective is to find the encoder and decoder functions that minimize the loss function, defined as the mean squared error between the estimated and true values. Facilitating the search for the optimum decoding and functions, we show that the loss function can be upper-bounded by the summation of two terms: the generalization error of the decoding function and the training error of the encoding function. Focusing on the second-order Sobolev space, we then derive the optimal encoder and decoder.

artificial intelligence, enc, machine learning, (16 more...)

Neural Information Processing Systems

Country: