AITopics | encoder-decoder architecture

Collaborating Authors

encoder-decoder architecture

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

412732f172bdd5ad0efde2fafa110700-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-10-2026, 23:11:22 GMT

architecture, dataset, reconstruction, (17 more...)

Neural Information Processing Systems

Country: Asia > Nepal (0.04)

Genre: Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Health Care Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Arriola, Marianne, Schiff, Yair, Phung, Hao, Gokaslan, Aaron, Kuleshov, Volodymyr

arXiv.org Artificial IntelligenceOct-28-2025

Discrete diffusion models enable parallel token sampling for faster inference than autoregressive approaches. However, prior diffusion models use a decoder-only architecture, which requires sampling algorithms that invoke the full network at every denoising step and incur high computational cost. Our key insight is that discrete diffusion models perform two types of computation: 1) representing clean tokens and 2) denoising corrupted tokens, which enables us to use separate modules for each task. We propose an encoder-decoder architecture to accelerate discrete diffusion inference, which relies on an encoder to represent clean tokens and a lightweight decoder to iteratively refine a noised sequence. We also show that this architecture enables faster training of block diffusion models, which partition sequences into blocks for better quality and are commonly used in diffusion language model inference. We introduce a framework for Efficient Encoder-Decoder Diffusion (E2D2), consisting of an architecture with specialized training and sampling algorithms, and we show that E2D2 achieves superior trade-offs between generation quality and inference throughput on summarization, translation, and mathematical reasoning tasks. We provide the code, model weights, and blog post on the project page: https://m-arriola.com/e2d2

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.22852

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Deep Neural ODE Operator Networks for PDEs

Li, Ziqian, Liu, Kang, Song, Yongcun, Yue, Hangrui, Zuazua, Enrique

arXiv.org Artificial IntelligenceOct-20-2025

Operator learning has emerged as a promising paradigm for developing efficient surrogate models to solve partial differential equations (PDEs). However, existing approaches often overlook the domain knowledge inherent in the underlying PDEs and hence suffer from challenges in capturing temporal dynamics and generalization issues beyond training time frames. This paper introduces a deep neural ordinary differential equation (ODE) operator network framework, termed NODE-ONet, to alleviate these limitations. The framework adopts an encoder-decoder architecture comprising three core components: an encoder that spatially discretizes input functions, a neural ODE capturing latent temporal dynamics, and a decoder reconstructing solutions in physical spaces. Theoretically, error analysis for the encoder-decoder architecture is investigated. Computationally, we propose novel physics-encoded neural ODEs to incorporate PDE-specific physical properties. Such well-designed neural ODEs significantly reduce the framework's complexity while enhancing numerical efficiency, robustness, applicability, and generalization capacity. Numerical experiments on nonlinear diffusion-reaction and Navier-Stokes equations demonstrate high accuracy, computational efficiency, and prediction capabilities beyond training time frames. Additionally, the framework's flexibility to accommodate diverse encoders/decoders and its ability to generalize across related PDE families further underscore its potential as a scalable, physics-encoded tool for scientific machine learning.

artificial intelligence, machine learning, operator, (17 more...)

arXiv.org Artificial Intelligence

2510.15651

Country:

North America > United States (0.46)
Europe > Spain (0.46)
Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

412732f172bdd5ad0efde2fafa110700-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsOct-8-2025, 13:19:07 GMT

architecture, dataset, reconstruction, (17 more...)

Neural Information Processing Systems

Country: Asia > Nepal (0.04)

Genre: Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Health Care Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Unsupervised Speech Enhancement using Data-defined Priors

Klement, Dominik, Maciejewski, Matthew, Khudanpur, Sanjeev, Černocký, Jan, Burget, Lukáš

arXiv.org Artificial IntelligenceSep-30-2025

The majority of deep learning-based speech enhancement methods require paired clean-noisy speech data. Collecting such data at scale in real-world conditions is infeasible, which has led the community to rely on synthetically generated noisy speech. However, this introduces a gap between the training and testing phases. In this work, we propose a novel dual-branch encoder-decoder architecture for unsupervised speech enhancement that separates the input into clean speech and residual noise. Adversarial training is employed to impose priors on each branch, defined by unpaired datasets of clean speech and, optionally, noise. Experimental results show that our method achieves performance comparable to leading unsupervised speech enhancement approaches. Furthermore, we demonstrate the critical impact of clean speech data selection on enhancement performance. In particular, our findings reveal that performance may appear overly optimistic when in-domain clean speech data are used for prior definition -- a practice adopted in previous unsupervised speech enhancement studies.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.22942

Country: Europe (0.28)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Symmetry-Aware Transformer Training for Automated Planning

Fritzsche, Markus, Gestrin, Elliot, Seipp, Jendrik

arXiv.org Artificial IntelligenceAug-12-2025

While transformers excel in many settings, their application in the field of automated planning is limited. Prior work like PlanGPT, a state-of-the-art decoder-only transformer, struggles with extrapolation from easy to hard planning problems. This in turn stems from problem symmetries: planning tasks can be represented with arbitrary variable names that carry no meaning beyond being identifiers. This causes a combinatorial explosion of equivalent representations that pure transformers cannot efficiently learn from. We propose a novel contrastive learning objective to make transformers symmetry-aware and thereby compensate for their lack of inductive bias. Combining this with architectural improvements, we show that transformers can be efficiently trained for either plan-generation or heuristic-prediction. Our results across multiple planning domains demonstrate that our symmetry-aware training effectively and efficiently addresses the limitations of PlanGPT.

artificial intelligence, machine learning, planning problem, (20 more...)

arXiv.org Artificial Intelligence

2508.07743

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Optimal Linear Baseline Models for Scientific Machine Learning

DeLise, Alexander, Loh, Kyle, Patel, Krish, Teague, Meredith, Arnold, Andrea, Chung, Matthias

arXiv.org Artificial IntelligenceAug-11-2025

In nearly every scientific discipline, a central challenge lies in modeling, computing, and understanding the functional relationships between signals, measurements, and their underlying physical processes. These mappings typically manifest in three fundamental forms: forward modeling, inference, and autoencoding. While mathematical models often provide insight into these relationships, they are frequently inadequate for real-world prediction and analysis due to limitations in analytical tractability, computational feasibility, or algorithmic robustness. The advent of scientific machine learning (ML) has led to a paradigm shift, where data-driven methods, particularly neural networks, have emerged as powerful tools for learning complex input-output relations directly from data. Unlike traditional model based approaches, neural networks are capable of overcoming longstanding issues such as computational complexity and scalability issues, model misspecification, and the ill-posedness inherent to many scientific problems [1]. A central strength of neural networks is their capacity to project inputs into a lower-dimensional latent space before mapping to targets, a principle commonly realized in autoencoder and encoder-decoder architectures.

artificial intelligence, inverse problem, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.05831

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.67)

Industry:

Banking & Finance > Trading (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Domain-Adaptive Small Language Models for Structured Tax Code Prediction

Nath, Souvik, Wadhwa, Sumit, Perez, Luis

arXiv.org Artificial IntelligenceJul-22-2025

Every day, multinational firms process thousands of transactions, each of which must adhere to tax regulations that vary by jurisdiction and are often nuanced. The determination of product and service tax codes, such as HSN or SAC is a major use case in Tax compliance. An accurate determination of such codes is imperative to avoid any tax penalties. This paper proposes a domain-adaptive small language model (SLM) with an encoder-decoder architecture for the enhanced prediction of product and service tax codes. In this approach, we address the problem of predicting hierarchical tax code sequences using unstructured product and services data. We employ an SLM based upon encoder-decoder architecture as this enables sequential generation of tax codes to capture the hierarchical dependencies present within the tax codes. Our experiments demonstrate that encoder-decoder SLMs can be successfully applied to the sequential prediction of structured tax codes, a domain that remains comparatively unexplored in current NLP research. In this paper, we demonstrate the superior performance of the domain-adaptive encoder-decoder SLMs over flat classifiers when applied to the Harmonized System of Nomenclature (HSN), and achieve superior results compared to decoder-only and encoder-only architectures for structured sequence generation tasks. This approach can also be scaled to other government-mandated tax commodity codes, such as United Nations Standard Products and Services Codes (UNSPSC), or Brazil's Nomenclatura Comum do Mercosul (NCM).

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.1088

Country: South America > Brazil (0.25)

Genre: Research Report (0.64)

Industry:

Law > Taxation Law (1.00)
Government > Tax (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Hierarchical Time Series Forecasting Via Latent Mean Encoding

Salatiello, Alessandro, Birr, Stefan, Kunz, Manuel

arXiv.org Artificial IntelligenceJun-25-2025

Coherently forecasting the behaviour of a target variable across both coarse and fine temporal scales is crucial for profit-optimized decision-making in several business applications, and remains an open research problem in temporal hierarchical forecasting. Here, we propose a new hierarchical architecture that tackles this problem by leveraging modules that specialize in forecasting the different temporal aggregation levels of interest. The architecture, which learns to encode the average behaviour of the target variable within its hidden layers, makes accurate and coherent forecasts across the target temporal hierarchies. We validate our architecture on the challenging, real-world M5 dataset and show that it outperforms established methods, such as the TSMixer model.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2506.19633

Genre: Research Report (0.50)

Industry: Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Generalizable Trajectory Prediction via Inverse Reinforcement Learning with Mamba-Graph Architecture

Li, Wenyun, Huang, Wenjie, Deng, Zejian, Sun, Chen

arXiv.org Artificial IntelligenceJun-17-2025

-- Accurate driving behavior modeling is fundamental to safe and efficient trajectory prediction, yet remains challenging in complex traffic scenarios. This paper presents a novel Inverse Reinforcement Learning (IRL) framework that captures human-like decision-making by inferring diverse reward functions, enabling robust cross-scenario adaptability. The learned reward function is utilized to maximize the likelihood of output by the encoder-decoder architecture that combines Mamba blocks for efficient long-sequence dependency modeling with graph attention networks to encode spatial interactions among traffic agents. Comprehensive evaluations on urban intersections and roundabouts demonstrate that the proposed method not only outperforms various popular approaches in prediction accuracy but also achieves 2 times higher generalization performance to unseen scenarios compared to other IRL-based method. Modern intelligent transportation systems (ITS) rely on accurate trajectory prediction to enhance road safety and traffic efficiency [1].

machine learning, reinforcement learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2506.12474

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.88)
Transportation > Infrastructure & Services (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback