itransformer
DecompKAN: Decomposed Patch-KAN for Long-Term Time Series Forecasting
Accurate time series forecasting in scientific domains such as climate modeling, physiological monitoring, and energy systems benefits from both competitive predictions and model transparency: practitioners value understanding how a model transforms temporal features, not merely what it predicts. Transformer-based models achieve strong accuracy but their attention weights reveal only token-level relevance, not the functional transformations applied to each feature. This work proposes DECOMPKAN, a lightweight attention-free architecture that combines trend-residual decomposition, channel-wise patching, learned instance normalization, and B-spline Kolmogorov-Arnold Network (KAN) edge functions. Each KAN edge learns an explicit, inspectable 1D scalar function ฯ(x) over learned patch-embedding coordinates that can be directly visualized, offering a form of architectural transparency not directly available in attention-based or MLP-based architectures. On standard benchmarks, DECOMPKAN achieves best or tied-best MSE on 15 of 32 dataset-horizon combinations among selected published baselines, and achieves best or tied-best MSE on 20 of 36 comparisons (25 of 36 MAE; ties counted for all tied models) under a controlled same-recipe evaluation across 9 datasets including the physiological PPG-DaLiA benchmark. The architecture shows particular strength on datasets with smooth temporal dynamics (Solar 17%, ECL 10%vs.
RG-TTA: Regime-Guided Meta-Control for Test-Time Adaptation in Streaming Time Series
Kumar, Indar, Tiwari, Akanksha, Jasti, Sai Krishna, Lade, Ankit Hemant
Test-time adaptation (TTA) enables neural forecasters to adapt to distribution shifts in streaming time series, but existing methods apply the same adaptation intensity regardless of the nature of the shift. We propose Regime-Guided Test-Time Adaptation (RG-TTA), a meta-controller that continuously modulates adaptation intensity based on distributional similarity to previously-seen regimes. Using an ensemble of Kolmogorov-Smirnov, Wasserstein-1, feature-distance, and variance-ratio metrics, RG-TTA computes a similarity score for each incoming batch and uses it to (i) smoothly scale the learning rate -- more aggressive for novel distributions, conservative for familiar ones -- and (ii) control gradient effort via loss-driven early stopping rather than fixed budgets, allowing the system to allocate exactly the effort each batch requires. As a supplementary mechanism, RG-TTA gates checkpoint reuse from a regime memory, loading stored specialist models only when they demonstrably outperform the current model (loss improvement >= 30%). RG-TTA is model-agnostic and strategy-composable: it wraps any forecaster exposing train/predict/save/load interfaces and enhances any gradient-based TTA method. We demonstrate three compositions -- RG-TTA, RG-EWC, and RG-DynaTTA -- and evaluate 6 update policies (3 baselines + 3 regime-guided variants) across 4 compact architectures (GRU, iTransformer, PatchTST, DLinear), 14 datasets (6 real-world multivariate benchmarks + 8 synthetic regime scenarios), and 4 forecast horizons (96, 192, 336, 720) under a streaming evaluation protocol with 3 random seeds (672 experiments total). Regime-guided policies achieve the lowest MSE in 156 of 224 seed-averaged experiments (69.6%), with RG-EWC winning 30.4% and RG-TTA winning 29.0%. Overall, RG-TTA reduces MSE by 5.7% vs TTA while running 5.5% faster; RG-EWC reduces MSE by 14.1% vs standalone EWC.
APT: Affine Prototype-Timestamp For Time Series Forecasting Under Distribution Shift
Li, Yujie, Shao, Zezhi, Yu, Chengqing, Fu, Yisong, Sun, Tao, Xu, Yongjun, Wang, Fei
Time series forecasting under distribution shift remains challenging, as existing deep learning models often rely on local statistical normalization (e.g., mean and variance) that fails to capture global distribution shift. Methods like RevIN and its variants attempt to decouple distribution and pattern but still struggle with missing values, noisy observations, and invalid channel-wise affine transformation. To address these limitations, we propose Affine Prototype-Timestamp (APT), a lightweight and flexible plug-in module that injects global distribution features into the normalization-forecasting pipeline. By leveraging timestamp-conditioned prototype learning, APT dynamically generates affine parameters that modulate both input and output series, enabling the backbone to learn from self-supervised, distribution-aware clustered instances. APT is compatible with arbitrary forecasting backbones and normalization strategies while introducing minimal computational overhead. Extensive experiments across six benchmark datasets and multiple backbone-normalization combinations demonstrate that APT significantly improves forecasting performance under distribution shift.
MDMLP-EIA: Multi-domain Dynamic MLPs with Energy Invariant Attention for Time Series Forecasting
Zhang, Hu, Dai, Zhien, Tang, Zhaohui, Xie, Yongfang
Time series forecasting is essential across diverse domains. While MLP-based methods have gained attention for achieving Transformer-comparable performance with fewer parameters and better robustness, they face critical limitations including loss of weak seasonal signals, capacity constraints in weight-sharing MLPs, and insufficient channel fusion in channel-independent strategies. To address these challenges, we propose MDMLP-EIA (Multi-domain Dynamic MLPs with Energy Invariant Attention) with three key innovations. First, we develop an adaptive fused dual-domain seasonal MLP that categorizes seasonal signals into strong and weak components. It employs an adaptive zero-initialized channel fusion strategy to minimize noise interference while effectively integrating predictions. Second, we introduce an energy invariant attention mechanism that adaptively focuses on different feature channels within trend and seasonal predictions across time steps. This mechanism maintains constant total signal energy to align with the decomposition-prediction-reconstruction framework and enhance robustness against disturbances. Third, we propose a dynamic capacity adjustment mechanism for channel-independent MLPs. This mechanism scales neuron count with the square root of channel count, ensuring sufficient capacity as channels increase. Extensive experiments across nine benchmark datasets demonstrate that MDMLP-EIA achieves state-of-the-art performance in both prediction accuracy and computational efficiency.
CometNet: Contextual Motif-guided Long-term Time Series Forecasting
Wang, Weixu, Zhou, Xiaobo, Qiao, Xin, Wang, Lei, Qiu, Tie
Long-term Time Series Forecasting is crucial across numerous critical domains, yet its accuracy remains fundamentally constrained by the receptive field bottleneck in existing models. Mainstream Transformer-and Multi-layer Per-ceptron (MLP)-based methods mainly rely on finite look-back windows, limiting their ability to model long-term dependencies and hurting forecasting performance. Naively extending the look-back window proves ineffective, as it not only introduces prohibitive computational complexity, but also drowns vital long-term dependencies in historical noise. To address these challenges, we propose CometNet, a novel Contextual Motif-guided Long-term Time Series Forecasting framework. CometNet first introduces a Contextual Motif Extraction module that identifies recurrent, dominant contextual motifs from complex historical sequences, providing extensive temporal dependencies far exceeding limited look-back windows; Subsequently, a Motif-guided Forecasting module is proposed, which integrates the extracted dominant motifs into forecasting. By dynamically mapping the look-back window to its relevant motifs, CometNet effectively harnesses their contextual information to strengthen long-term forecasting capability. Extensive experimental results on eight real-world datasets have demonstrated that CometNet significantly outperforms current state-of-the-art (SOT A) methods, particularly on extended forecast horizons.
IBMA: An Imputation-Based Mixup Augmentation Using Self-Supervised Learning for Time Series Data
Nguyen, Dang Nha, Nguyen, Hai Dang, Nguyen, Khoa Tho Anh
Data augmentation plays a crucial role in enhancing model performance across various AI fields by introducing variability while maintaining the underlying temporal patterns. However, in the context of long sequence time series data, where maintaining temporal consistency is critical, there are fewer augmentation strategies compared to fields such as image or text, with advanced techniques like Mixup rarely being used. In this work, we propose a new approach, Imputation-based Mixup Augmentation (IMA), which combines Imputed-data Augmentation with Mixup Augmentation to bolster model generalization and improve forecasting performance. We evaluate the effectiveness of this method across several forecasting models, including DLinear (MLP), TimesNet (CNN), and iTrainformer (Transformer), these models represent some of the most recent advances in long sequence time series forecasting. Our experiments, conducted on three datasets (ETT -small, Illness, Exchange Rate) from various domains and compared against eight other augmentation techniques, demonstrate that IMA consistently enhances performance, achieving 22 improvements out of 24 instances, with 10 of those being the best performances, particularly with iTrain-former imputation in ETT dataset. The GitHub repository is available at: https://github.com/dangnha/IMA.
Selective Learning for Deep Time Series Forecasting
Fu, Yisong, Shao, Zezhi, Yu, Chengqing, Li, Yujie, An, Zhulin, Wang, Qi, Xu, Yongjun, Wang, Fei
Benefiting from high capacity for capturing complex temporal patterns, deep learning (DL) has significantly advanced time series forecasting (TSF). However, deep models tend to suffer from severe overfitting due to the inherent vulnerability of time series to noise and anomalies. The prevailing DL paradigm uniformly optimizes all timesteps through the MSE loss and learns those uncertain and anomalous timesteps without difference, ultimately resulting in overfitting. To address this, we propose a novel selective learning strategy for deep TSF. Specifically, selective learning screens a subset of the whole timesteps to calculate the MSE loss in optimization, guiding the model to focus on generalizable timesteps while disregarding non-generalizable ones. Our framework introduces a dual-mask mechanism to target timesteps: (1) an uncertainty mask leveraging residual entropy to filter uncertain timesteps, and (2) an anomaly mask employing residual lower bound estimation to exclude anomalous timesteps. Extensive experiments across eight real-world datasets demonstrate that selective learning can significantly improve the predictive performance for typical state-of-the-art deep models, including 37.4% MSE reduction for Informer, 8.4% for TimesNet, and 6.5% for iTransformer.
Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency
Liang, Renzhao, Xu, Sizhe, Xie, Chenggang, Chen, Jingru, Ren, Feiyang, Yang, Shu, Yabe, Takahiro
Time series forecasting plays a pivotal role in critical domains such as energy management and financial markets. Although deep learning-based approaches (e.g., MLP, RNN, Transformer) have achieved remarkable progress, the prevailing "long-sequence information gain hypothesis" exhibits inherent limitations. Through systematic experimentation, this study reveals a counterintuitive phenomenon: appropriately truncating historical data can paradoxically enhance prediction accuracy, indicating that existing models learn substantial redundant features (e.g., noise or irrelevant fluctuations) during training, thereby compromising effective signal extraction. Building upon information bottleneck theory, we propose an innovative solution termed Adaptive Masking Loss with Representation Consistency (AMRC), which features two core components: 1) Dynamic masking loss, which adaptively identified highly discriminative temporal segments to guide gradient descent during model training; 2) Representation consistency constraint, which stabilized the mapping relationships among inputs, labels, and predictions. Experimental results demonstrate that AMRC effectively suppresses redundant feature learning while significantly improving model performance. This work not only challenges conventional assumptions in temporal modeling but also provides novel theoretical insights and methodological breakthroughs for developing efficient and robust forecasting models.
Benchmarking M-LTSF: Frequency and Noise-Based Evaluation of Multivariate Long Time Series Forecasting Models
Janรen, Nick, Schaller, Melanie, Rosenhahn, Bodo
Abstract--Understanding the robustness of deep learning models for multivariate long-term time series forecasting (ML TSF) remains challenging, as evaluations typically rely on real-world datasets with unknown noise properties. We propose a simulation-based evaluation framework that generates parameterizable synthetic datasets, where each dataset instance corresponds to a different configuration of signal components, noise types, signal-to-noise ratios, and frequency characteristics. These configurable components aim to model real-world multivariate time series data without the ambiguity of unknown noise. This framework enables fine-grained, systematic evaluation of M-L TSF models under controlled and diverse scenarios. Our analysis reveals that all models degrade severely when lookback windows cannot capture complete periods of seasonal patters in the data. S-Mamba and Autoformer perform best on sawtooth patterns, while R-Linear and iTransformer favor sinusoidal signals. White and Brownian noise universally degrade performance with lower signal-to-noise ratio while S-Mamba shows specific trend-noise and iTransformer shows seasonal-noise vulnerability. Further spectral analysis shows that S-Mamba and iTransformer achieve superior frequency reconstruction. This controlled approach, based on our synthetic and principle-driven testbed, offers deeper insights into model-specific strengths and limitations through the aggregation of MSE scores and provides concrete guidance for model selection based on signal characteristics and noise conditions. IME series forecasting plays a crucial role across diverse fields such as energy systems [1]-[3], meteorology [4], [5], traffic flow modeling [6], [7] or the modeling of sensor networks [8], [9]. Reliable forecasts support proactive decision-making, effective risk management, and efficient planning. As high-resolution temporal data becomes increasingly available, the need for robust and scalable forecasting models has grown more important than ever. A time series represents data points ordered in time and can be categorized as either univariate, when consisting of a single variable, or multivariate, when involving multiple interdependent variables [10].
Accelerating Long-Term Molecular Dynamics with Physics-Informed Time-Series Forecasting
Le, Hung, Abbas, Sherif, Nguyen, Minh Hoang, Do, Van Dai, Nguyen, Huu Hiep, Nguyen, Dung
Efficient molecular dynamics (MD) simulation is vital for understanding atomic-scale processes in materials science and biophysics. Traditional density functional theory (DFT) methods are computationally expensive, which limits the feasibility of long-term simulations. We propose a novel approach that formulates MD simulation as a time-series forecasting problem, enabling advanced forecasting models to predict atomic trajectories via displacements rather than absolute positions. We incorporate a physics-informed loss and inference mechanism based on DFT-parametrised pair-wise Morse potential functions that penalize unphysical atomic proximity to enforce physical plausibility. Our method consistently surpasses standard baselines in simulation accuracy across diverse materials. The results highlight the importance of incorporating physics knowledge to enhance the reliability and precision of atomic trajectory forecasting. Remarkably, it enables stable modeling of thousands of MD steps in minutes, offering a scalable alternative to costly DFT simulations.