PSformer: Parameter-efficient Transformer with Segment Attention for Time Series Forecasting

Wang, Yanlong, Xu, Jian, Ma, Fei, Huang, Shao-Lun, Sun, Danny Dongning, Zhang, Xiao-Ping

arXiv.org Artificial Intelligence 

Time series forecasting remains a critical challenge across various domains, often complicated by high-dimensional data and long-term dependencies. This paper presents a novel transformer architecture for time series forecasting, incorporating two key innovations: parameter sharing (PS) and Spatial-Temporal Segment Attention (SegAtt). We also define the time series segment as the concatenation of sequence patches from the same positions across different variables. The proposed model, PSformer, reduces the number of training parameters through the parameter sharing mechanism, thereby improving model efficiency and scalability. The introduction of SegAtt could enhance the capability of capturing local spatio-temporal dependencies by computing attention over the segments, and improve global representation by integrating information across segments. The combination of parameter sharing and SegAtt significantly improves the forecasting performance. Extensive experiments on benchmark datasets demonstrate that PSformer outperforms popular baselines and other transformer-based approaches in terms of accuracy and scalability, establishing itself as an accurate and scalable tool for time series forecasting. With the advancement of artificial intelligence techniques, significant efforts have been devoted to developing innovative models that continue to improve the prediction performance (Liang et al., 2024; Wang et al., 2024). In particular, the transformer-based model family has recently attracted more attention for its proved success in nature language processing (OpenAI et al., 2024) and computer vision (Liu et al., 2021; Dosovitskiy et al., 2021). Moreover, pre-trained large models based on the transformer architecture have shown advantages in time series forecasting(Liu et al., 2024a; Jin et al., 2024; Chang et al., 2023; Woo et al., 2024), demonstrating that increasing the amount of parameters in transformer models and the volume of training data can effectively enhance the model capability.