Goto

Collaborating Authors

 encoder-decoder architecture



Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

arXiv.org Artificial Intelligence

Discrete diffusion models enable parallel token sampling for faster inference than autoregressive approaches. However, prior diffusion models use a decoder-only architecture, which requires sampling algorithms that invoke the full network at every denoising step and incur high computational cost. Our key insight is that discrete diffusion models perform two types of computation: 1) representing clean tokens and 2) denoising corrupted tokens, which enables us to use separate modules for each task. We propose an encoder-decoder architecture to accelerate discrete diffusion inference, which relies on an encoder to represent clean tokens and a lightweight decoder to iteratively refine a noised sequence. We also show that this architecture enables faster training of block diffusion models, which partition sequences into blocks for better quality and are commonly used in diffusion language model inference. We introduce a framework for Efficient Encoder-Decoder Diffusion (E2D2), consisting of an architecture with specialized training and sampling algorithms, and we show that E2D2 achieves superior trade-offs between generation quality and inference throughput on summarization, translation, and mathematical reasoning tasks. We provide the code, model weights, and blog post on the project page: https://m-arriola.com/e2d2


Deep Neural ODE Operator Networks for PDEs

arXiv.org Artificial Intelligence

Operator learning has emerged as a promising paradigm for developing efficient surrogate models to solve partial differential equations (PDEs). However, existing approaches often overlook the domain knowledge inherent in the underlying PDEs and hence suffer from challenges in capturing temporal dynamics and generalization issues beyond training time frames. This paper introduces a deep neural ordinary differential equation (ODE) operator network framework, termed NODE-ONet, to alleviate these limitations. The framework adopts an encoder-decoder architecture comprising three core components: an encoder that spatially discretizes input functions, a neural ODE capturing latent temporal dynamics, and a decoder reconstructing solutions in physical spaces. Theoretically, error analysis for the encoder-decoder architecture is investigated. Computationally, we propose novel physics-encoded neural ODEs to incorporate PDE-specific physical properties. Such well-designed neural ODEs significantly reduce the framework's complexity while enhancing numerical efficiency, robustness, applicability, and generalization capacity. Numerical experiments on nonlinear diffusion-reaction and Navier-Stokes equations demonstrate high accuracy, computational efficiency, and prediction capabilities beyond training time frames. Additionally, the framework's flexibility to accommodate diverse encoders/decoders and its ability to generalize across related PDE families further underscore its potential as a scalable, physics-encoded tool for scientific machine learning.



Unsupervised Speech Enhancement using Data-defined Priors

arXiv.org Artificial Intelligence

The majority of deep learning-based speech enhancement methods require paired clean-noisy speech data. Collecting such data at scale in real-world conditions is infeasible, which has led the community to rely on synthetically generated noisy speech. However, this introduces a gap between the training and testing phases. In this work, we propose a novel dual-branch encoder-decoder architecture for unsupervised speech enhancement that separates the input into clean speech and residual noise. Adversarial training is employed to impose priors on each branch, defined by unpaired datasets of clean speech and, optionally, noise. Experimental results show that our method achieves performance comparable to leading unsupervised speech enhancement approaches. Furthermore, we demonstrate the critical impact of clean speech data selection on enhancement performance. In particular, our findings reveal that performance may appear overly optimistic when in-domain clean speech data are used for prior definition -- a practice adopted in previous unsupervised speech enhancement studies.


Symmetry-Aware Transformer Training for Automated Planning

arXiv.org Artificial Intelligence

While transformers excel in many settings, their application in the field of automated planning is limited. Prior work like PlanGPT, a state-of-the-art decoder-only transformer, struggles with extrapolation from easy to hard planning problems. This in turn stems from problem symmetries: planning tasks can be represented with arbitrary variable names that carry no meaning beyond being identifiers. This causes a combinatorial explosion of equivalent representations that pure transformers cannot efficiently learn from. We propose a novel contrastive learning objective to make transformers symmetry-aware and thereby compensate for their lack of inductive bias. Combining this with architectural improvements, we show that transformers can be efficiently trained for either plan-generation or heuristic-prediction. Our results across multiple planning domains demonstrate that our symmetry-aware training effectively and efficiently addresses the limitations of PlanGPT.


Optimal Linear Baseline Models for Scientific Machine Learning

arXiv.org Artificial Intelligence

In nearly every scientific discipline, a central challenge lies in modeling, computing, and understanding the functional relationships between signals, measurements, and their underlying physical processes. These mappings typically manifest in three fundamental forms: forward modeling, inference, and autoencoding. While mathematical models often provide insight into these relationships, they are frequently inadequate for real-world prediction and analysis due to limitations in analytical tractability, computational feasibility, or algorithmic robustness. The advent of scientific machine learning (ML) has led to a paradigm shift, where data-driven methods, particularly neural networks, have emerged as powerful tools for learning complex input-output relations directly from data. Unlike traditional model based approaches, neural networks are capable of overcoming longstanding issues such as computational complexity and scalability issues, model misspecification, and the ill-posedness inherent to many scientific problems [1]. A central strength of neural networks is their capacity to project inputs into a lower-dimensional latent space before mapping to targets, a principle commonly realized in autoencoder and encoder-decoder architectures.


Domain-Adaptive Small Language Models for Structured Tax Code Prediction

arXiv.org Artificial Intelligence

Every day, multinational firms process thousands of transactions, each of which must adhere to tax regulations that vary by jurisdiction and are often nuanced. The determination of product and service tax codes, such as HSN or SAC is a major use case in Tax compliance. An accurate determination of such codes is imperative to avoid any tax penalties. This paper proposes a domain-adaptive small language model (SLM) with an encoder-decoder architecture for the enhanced prediction of product and service tax codes. In this approach, we address the problem of predicting hierarchical tax code sequences using unstructured product and services data. We employ an SLM based upon encoder-decoder architecture as this enables sequential generation of tax codes to capture the hierarchical dependencies present within the tax codes. Our experiments demonstrate that encoder-decoder SLMs can be successfully applied to the sequential prediction of structured tax codes, a domain that remains comparatively unexplored in current NLP research. In this paper, we demonstrate the superior performance of the domain-adaptive encoder-decoder SLMs over flat classifiers when applied to the Harmonized System of Nomenclature (HSN), and achieve superior results compared to decoder-only and encoder-only architectures for structured sequence generation tasks. This approach can also be scaled to other government-mandated tax commodity codes, such as United Nations Standard Products and Services Codes (UNSPSC), or Brazil's Nomenclatura Comum do Mercosul (NCM).


Hierarchical Time Series Forecasting Via Latent Mean Encoding

arXiv.org Artificial Intelligence

Coherently forecasting the behaviour of a target variable across both coarse and fine temporal scales is crucial for profit-optimized decision-making in several business applications, and remains an open research problem in temporal hierarchical forecasting. Here, we propose a new hierarchical architecture that tackles this problem by leveraging modules that specialize in forecasting the different temporal aggregation levels of interest. The architecture, which learns to encode the average behaviour of the target variable within its hidden layers, makes accurate and coherent forecasts across the target temporal hierarchies. We validate our architecture on the challenging, real-world M5 dataset and show that it outperforms established methods, such as the TSMixer model.


Generalizable Trajectory Prediction via Inverse Reinforcement Learning with Mamba-Graph Architecture

arXiv.org Artificial Intelligence

-- Accurate driving behavior modeling is fundamental to safe and efficient trajectory prediction, yet remains challenging in complex traffic scenarios. This paper presents a novel Inverse Reinforcement Learning (IRL) framework that captures human-like decision-making by inferring diverse reward functions, enabling robust cross-scenario adaptability. The learned reward function is utilized to maximize the likelihood of output by the encoder-decoder architecture that combines Mamba blocks for efficient long-sequence dependency modeling with graph attention networks to encode spatial interactions among traffic agents. Comprehensive evaluations on urban intersections and roundabouts demonstrate that the proposed method not only outperforms various popular approaches in prediction accuracy but also achieves 2 times higher generalization performance to unseen scenarios compared to other IRL-based method. Modern intelligent transportation systems (ITS) rely on accurate trajectory prediction to enhance road safety and traffic efficiency [1].