Goto

Collaborating Authors

 autoformer



Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

Neural Information Processing Systems

Extending the forecasting time is a critical demand for real applications, such as extreme weather early warning and long-term energy consumption planning. This paper studies the long-term forecasting problem of time series. Prior Transformer-based models adopt various self-attention mechanisms to discover the long-range dependencies. However, intricate temporal patterns of the long-term future prohibit the model from finding reliable dependencies. Also, Transformers have to adopt the sparse versions of point-wise self-attentions for long series efficiency, resulting in the information utilization bottleneck.


Beyond MSE: Ordinal Cross-Entropy for Probabilistic Time Series Forecasting

Wang, Jieting, Shi, Huimei, Li, Feijiang, Shang, Xiaolei

arXiv.org Artificial Intelligence

Time series forecasting is an important task that involves analyzing temporal dependencies and underlying patterns (such as trends, cyclicality, and seasonality) in historical data to predict future values or trends. Current deep learning-based forecasting models primarily employ Mean Squared Error (MSE) loss functions for regression modeling. Despite enabling direct value prediction, this method offers no uncertainty estimation and exhibits poor outlier robustness. To address these limitations, we propose OCE-TS, a novel ordinal classification approach for time series forecasting that replaces MSE with Ordinal Cross-Entropy (OCE) loss, preserving prediction order while quantifying uncertainty through probability output. Specifically, OCE-TS begins by discretizing observed values into ordered intervals and deriving their probabilities via a parametric distribution as supervision signals. Using a simple linear model, we then predict probability distributions for each timestep. The OCE loss is computed between the cumulative distributions of predicted and ground-truth probabilities, explicitly preserving ordinal relationships among forecasted values. Through theoretical analysis using influence functions, we establish that cross-entropy (CE) loss exhibits superior stability and outlier robustness compared to MSE loss. Empirically, we compared OCE-TS with five baseline models-Autoformer, DLinear, iTransformer, TimeXer, and TimeBridge-on seven public time series datasets. Using MSE and Mean Absolute Error (MAE) as evaluation metrics, the results demonstrate that OCE-TS consistently outperforms benchmark models. The codeis publicly available at: https://github.com/Shi-hm/OCE-TS.


Lightweight and Data-Efficient MultivariateTime Series Forecasting using Residual-Stacked Gaussian (RS-GLinear) Architecture

Ali, Abukar

arXiv.org Artificial Intelligence

-- Following the success of Transformer architectures and their self - attention mechanism in language modelling -- particularly due to their ability to capture long - range dependencies -- many researchers have explored how these architectures can be adopted for time - series forecasting. Varia nts of Transformer - based models have been proposed to handle both short - and long - term sequence modeling, aiming to predict future time - dependent values from historical observations using varying input window sizes. However, despite the popularity of lever a ging Transformer architecture to extract temporal relationships from set of continu ou s datapoints, their performance in time - series forecasting has shown mixed results. Several researchers, including Zeng et al. (2022) and Rizvi et al. (2025), have challenged the reliability of emerging Transformer - based solutions for long - term forecasting tasks. In this research, our first objective is to evaluate the G aussian - based Linear (GLinear) architecture proposed by Ri z vi et al. (2025) and to develop an enhanced ve rsion of it -- referred to in this study as Residual Stacked GLinear (RS - GLinear) model. The second objective is to assess the broader applicability of the RS - GLinear model by extending its use to additional domain -- financial time series and epidemiological data -- which were not explored in the baseline model proposed by Rizvi et al. (2025). Most time - series implementations (Transformer - based and Linear models) we came across commonly adopt baseline codebases provided by the Hugging Face repository, including our baseline GLinear model used in this study. Therefore, the RS - GLinear model developed in this study is an extended version of the codebase introduced in the research by Rizvi et al. (2025) . Keywords -- Multivariate Time Series Forecasting, Transformer - based models, Weather, Influenza - like Illness, Deep Learning, Transformer - based architecture, Residual - Stacked GLinear, Neural - Network. Time series forecasting has been an important research area in many domains such as finance/economics, retail, healthcare, cloud infrastructure, met eo rology, and traffic management (Toner e t al. 2024). Since the introduction of T ransformer Model (Vaswani et al. 2017), there has been large amount of research focusing on time - series forecasting using Large Language Models (LLM) to leverage LLM's sequential dependencies in text generation (Tan et al. 2024).


Quantum-Optimized Selective State Space Model for Efficient Time Series Prediction

Jura, Stefan-Alexandru, Udrescu, Mihai, Topirceanu, Alexandru

arXiv.org Artificial Intelligence

Long-range time series forecasting remains challenging, as it requires capturing non-stationary and multi-scale temporal dependencies while maintaining noise robustness, efficiency, and stability. Transformer-based architectures such as Autoformer and Informer improve generalization but suffer from quadratic complexity and degraded performance on very long time horizons. State space models, notably S-Mamba, provide linear-time updates but often face unstable training dynamics, sensitivity to initialization, and limited robustness for multivariate forecasting. To address such challenges, we propose the Quantum-Optimized Selective State Space Model (Q-SSM), a hybrid quantum-optimized approach that integrates state space dynamics with a variational quantum gate. Instead of relying on expensive attention mechanisms, Q-SSM employs a simple parametrized quantum circuit (RY-RX ansatz) whose expectation values regulate memory updates adaptively. This quantum gating mechanism improves convergence stability, enhances the modeling of long-term dependencies, and provides a lightweight alternative to attention. We empirically validate Q-SSM on three widely used benchmarks, i.e., ETT, Traffic, and Exchange Rate. Results show that Q-SSM consistently improves over strong baselines (LSTM, TCN, Reformer), Transformer-based models, and S-Mamba. These findings demonstrate that variational quantum gating can address current limitations in long-range forecasting, leading to accurate and robust multivariate predictions.




Frequency-Constrained Learning for Long-Term Forecasting

Kong, Menglin, Zheng, Vincent Zhihao, Sun, Lijun

arXiv.org Artificial Intelligence

However, modern deep forecasting models often fail to capture these recurring patterns due to spectral bias and a lack of frequency-aware inductive priors. Motivated by this gap, we propose a simple yet effective method that enhances long-term forecasting by explicitly modeling periodicity through spectral initialization and frequency-constrained optimization. Specifically, we extract dominant low-frequency components via Fast Fourier Transform (FFT)-guided coordinate descent, initialize sinusoidal embeddings with these components, and employ a two-speed learning schedule to preserve meaningful frequency structure during training. Our approach is model-agnostic and integrates seamlessly into existing Transformer-based architectures. Extensive experiments across diverse real-world benchmarks demonstrate consistent performance gains--particularly at long horizons--highlighting the benefits of injecting spectral priors into deep temporal models for robust and interpretable long-range forecasting. Moreover, on synthetic data, our method accurately recovers ground-truth frequencies, further validating its interpretability and effectiveness in capturing latent periodic patterns.


A Review of the Long Horizon Forecasting Problem in Time Series Analysis

Krupakar, Hans, A, Kandappan V

arXiv.org Machine Learning

The long horizon forecasting (LHF) problem has come up in the time series literature for over the last 35 years or so. This review covers aspects of LHF in this period and how deep learning has incorporated variants of trend, seasonality, fourier and wavelet transforms, misspecification bias reduction and bandpass filters while contributing using convolutions, residual connections, sparsity reduction, strided convolutions, attention masks, SSMs, normalization methods, low-rank approximations and gating mechanisms. We highlight time series decomposition techniques, input data preprocessing and dataset windowing schemes that improve performance. Multi-layer perceptron models, recurrent neural network hybrids, self-attention models that improve and/or address the performances of the LHF problem are described, with an emphasis on the feature space construction. Ablation studies are conducted over the ETTm2 dataset in the multivariate and univariate high useful load (HUFL) forecasting contexts, evaluated over the last 4 months of the dataset. The heatmaps of MSE averages per time step over test set series in the horizon show that there is a steady increase in the error proportionate to its length except with xLSTM and Triformer models and motivate LHF as an error propagation problem. The trained models are available here: https://bit.ly/LHFModelZoo


Transformer-Based Decomposition of Electrodermal Activity for Real-World Mental Health Applications

Tsirmpas, Charalampos, Konstantopoulos, Stasinos, Andrikopoulos, Dimitris, Kyriakouli, Konstantina, Fatouros, Panagiotis

arXiv.org Artificial Intelligence

Decomposing Electrodermal Activity (EDA) into phasic (short-term, stimulus-linked responses) and tonic (longer-term baseline) components is essential for extracting meaningful emotional and physiological biomarkers. This study presents a comparative analysis of knowledge-driven, statistical, and deep learning-based methods for EDA signal decomposition, with a focus on in-the-wild data collected from wearable devices. In particular, the authors introduce the Feel Transformer, a novel Transformer-based model adapted from the Autoformer architecture, designed to separate phasic and tonic components without explicit supervision. The model leverages pooling and trend-removal mechanisms to enforce physiologically meaningful decompositions. Comparative experiments against methods such as Ledalab, cvxEDA, and conventional detrending show that the Feel Transformer achieves a balance between feature fidelity (SCR frequency, amplitude, and tonic slope) and robustness to noisy, real-world data. The model demonstrates potential for real-time biosignal analysis and future applications in stress prediction, digital mental health interventions, and physiological forecasting.