AITopics | decoder layer

Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Neural Information Processing SystemsJun-15-2026, 03:26:06 GMT

Discrete diffusion models enable parallel token sampling for faster inference than autoregressive approaches. However, prior diffusion models use a decoder-only architecture, which requires sampling algorithms that invoke the full network at every denoising step and incur high computational cost. Our key insight is that discrete diffusion models perform two types of computation: 1) representing clean tokens and 2) denoising corrupted tokens, which enables us to use separate modules for each task. We propose an encoder-decoder architecture to accelerate discrete diffusion inference, which relies on an encoder to represent clean tokens and a lightweight decoder to iteratively refine a noised sequence. We also show that this architecture enables faster training of block diffusion models, which partition sequences into blocks for better quality and are commonly used in diffusion language model inference. We introduce a framework for Efficient Encoder-Decoder Diffusion (E2D2), consisting of an architecture with specialized training and sampling algorithms, and we show that E2D2 achieves superior trade-offs between generation quality and inference throughput on summarization, translation, and mathematical reasoning tasks. We provide the code1, model weights, and blog post on the project page: https://m-arriola.com/e2d2.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Dual-Path Temporal Decoder for End-to-End Multi-Object Tracking

Neural Information Processing SystemsJun-14-2026, 11:41:27 GMT

We present a novel end-to-end transformer-based framework for Multiple Object Tracking (MOT) that advances temporal modeling and identity preservation. Despite recent progress in transformer-based MOT, existing methods still struggle to maintain consistent object identities across frames, especially under occlusions, appearance changes, or detection failures. We propose a dual-path temporal decoder that explicitly separates appearance adaptation and identity preservation. The appearance-adaptive decoder dynamically updates query features using current frame information, while the identity-preserving decoder freezes query features and reuses historical sampling offsets to maintain long-term temporal consistency. To further enhance stability, we introduce a confidence-guided update suppression strategy that retains previously reliable features when predictions are unreliable. Extensive experiments on MOT benchmarks demonstrate that our approach achieves state-of-the-art performance across major tracking metrics, with significant gains in association accuracy and identity consistency. Our results demonstrate the importance of decoupling dynamic appearance modeling from static identity cues, and provide a scalable foundation for robust tracking in complex scenarios.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.57)

Add feedback

Unlimiformer: Long-Range Transformers with Unlimited Length Input

Neural Information Processing SystemsApr-28-2026, 13:40:56 GMT

Since the proposal of transformers (Vaswani et al., 2017), these models have been limited to bounded input lengths, because of their need to attend to every token in the input. In this work, we propose Unlimiformer: a general approach that wraps any existing pretrained encoder-decoder transformer, and offloads the cross-attention computation to a single k-nearest-neighbor (kNN) index, while the returned kNN distances are the attention dot-product scores. This kNN index can be kept on either the GPU or CPU memory and queried in sub-linear time; this way, we can index practically unlimited input sequences, while every attention head in every decoder layer retrieves its top-k keys, instead of attending to every key. We evaluate Unlimiformer on several long-document and book-summarization benchmarks, showing that it can process even 500k token-long inputs from the BookSum dataset, without any input truncation at test time. We demonstrate that Unlimiformer improves pretrained models such as BART (Lewis et al., 2020a) and Longformer (Beltagy et al., 2020) by extending them to unlimited inputs without additional learned weights and without modifying their code. Our code and models are publicly available, and support LLaMA-2 as well2.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Personal (0.67)

Industry:

Law (0.67)
Health & Medicine > Therapeutic Area (0.46)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.86)

Add feedback

2ab47c960bfee4f86dfc362f26ad066a-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 06:01:06 GMT

artificial intelligence, machine learning, trajectory, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)

Add feedback

2ab47c960bfee4f86dfc362f26ad066a-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 06:01:03 GMT

machine learning, natural language, prediction, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

1cdf14d1e3699d61d237cf76ce1c2dca-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 23:51:00 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Europe > Germany (0.28)

Genre: Research Report (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

1d774c112926348c3e25ea47d87c835b-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 23:34:39 GMT

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.38)

Add feedback

DeepInteraction: 3DObject Detection via Modality Interaction

Neural Information Processing SystemsApr-24-2026, 13:31:07 GMT

Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy. This design is however fundamentally restricted due to overlooking the modality-specific useful information and finally hampering the model performance. To address this limitation, in this work we introduce a novel modality interaction strategy where individual per-modality representations are learned and maintained throughout for enabling their unique characteristics to be exploited during object detection. To realize this proposed strategy, we design a DeepInteractionarchitecture characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments on the large-scale nuScenes dataset show that our proposed method surpasses all prior arts often by a large margin. Crucially, our method is ranked at the first position at the highly competitive nuScenes object detection leaderboard.

artificial intelligence, machine learning, representation, (17 more...)

Neural Information Processing Systems

Industry: Information Technology (0.46)

Technology: