AITopics | Oceania

Collaborating Authors

Oceania

Parallelizing Linear Transformers with the Delta Rule over Sequence Length Songlin Y ang Bailin Wang Y u Zhang Yikang Shen Y oon Kim Massachusetts Institute of Technology Soochow University

Neural Information Processing SystemsOct-10-2025, 17:21:29 GMT

Transformers with linear attention (i.e., linear transfor mers) and state-space models have recently been suggested as a viable linear-time alt ernative to transformers with softmax attention. However, these models still underp erform transformers especially on tasks that require in-context retrieval. Whil e more expressive variants of linear transformers which replace the additive upda te in linear transformers with the delta rule [DeltaNet; 101 ] have been found to be more effective at associative recall, existing algorithms for training such mode ls do not parallelize over sequence length and are thus inefficient to train on modern ha rdware. This work describes a hardware-efficient algorithm for training line ar transformers with the delta rule, which exploits a memory-efficient representati on for computing products of Householder matrices [ 11 ]. This algorithm allows us to scale up DeltaNet to standard language modeling settings. We train a 1.3B mode l for 100B tokens and find that it outperforms recent linear-time baselines su ch as Mamba [ 31 ] and GLA [ 124 ] in terms of perplexity and zero-shot performance on downst ream tasks. We also experiment with two hybrid models which combine Delt aNet layers with (1) sliding-window attention layers every other layer or (2) two global attention layers, and find that these hybrids outperform strong transf ormer baselines.

arxiv preprint, deltanet, international conference, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Rwanda > Kigali > Kigali (0.04)
North America > United States > Maryland > Baltimore (0.04)
(19 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.67)
Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Image2Struct: Benchmarking Structure Extraction for Vision-Language Models

Neural Information Processing SystemsOct-10-2025, 17:19:16 GMT

We introduce Image2Struct, a benchmark to evaluate vision-language models (VLMs) on extracting structure from images.

benchmark, dataset, similarity, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
Oceania > Australia (0.04)
North America > Montserrat (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Government (0.67)
Law > Intellectual Property & Technology Law (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Multi-Group Proportional Representation in Retrieval

Neural Information Processing SystemsOct-10-2025, 17:09:44 GMT

Current approaches to mitigate these representational harms balance the number of retrieved items across population groups defined by a small number of (often binary) attributes. However, most existing methods overlook intersectional groups determined by combinations of group attributes, such as gender, race, and ethnicity.

mpr, proportional representation, representation, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Uruguay (0.04)
Oceania > New Zealand (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government (1.00)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

FIDE: F requency-I nflated Conditional Diffusion Model for E xtreme-A ware Time Series Generation

Neural Information Processing SystemsOct-10-2025, 17:08:54 GMT

It also extends the traditional diffusion-based model, enabling the generation of samples conditioned on the block maxima, thereby enhancing the model's

block maxima, block maxima value, time sery, (12 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Michigan (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Industry:

Information Technology (0.67)
Banking & Finance (0.67)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ced76a666704e381c3039871ffe558ee-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 17:01:24 GMT

bleurt, slt, translation, (14 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Europe > Switzerland (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(19 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

cdf6f8e9fd9aeaf79b6024caec24f15b-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-10-2025, 16:59:51 GMT

To ensure quality, all charts and questions are handpicked, curated, and verified by human experts.

claude 3, final answer, subplot, (13 more...)

Neural Information Processing Systems

Country:

Europe > Italy (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(9 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law (1.00)
Banking & Finance (0.92)
Government (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(5 more...)

Add feedback

Compositional Generalization Across Distributional Shifts with Sparse Tree Operations

Neural Information Processing SystemsOct-10-2025, 16:51:48 GMT

Work partially completed while at Microsoft Research.

generalization, representation, sdtm, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

DisC-GS: Discontinuity-aware Gaussian Splatting

Neural Information Processing SystemsOct-10-2025, 16:47:36 GMT

Additionally, we introduce a Bézier-boundary gradient approximation strategy within our framework to keep the "differentiability"

gaussian distribution, gaussian splatting, zier curve, (14 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Queensland (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework

Neural Information Processing SystemsOct-10-2025, 16:40:03 GMT

Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a combination of the data, rather than the raw data itself. The final result then is decoded from the collective outputs of the worker nodes. However, there is a significant gap between current coded computing approaches and the broader landscape of general distributed computing, particularly when it comes to machine learning workloads. To bridge this gap, we propose a novel foundation for coded computing, integrating the principles of learning theory, and developing a framework that seamlessly adapts with machine learning applications. In this framework, the objective is to find the encoder and decoder functions that minimize the loss function, defined as the mean squared error between the estimated and true values. Facilitating the search for the optimum decoding and functions, we show that the loss function can be upper-bounded by the summation of two terms: the generalization error of the decoding function and the training error of the encoding function. Focusing on the second-order Sobolev space, we then derive the optimal encoder and decoder.

computation, computing, enc, (14 more...)

Neural Information Processing Systems

Country: