Dynamic Relational Priming Improves Transformer in Multivariate Time Series
–arXiv.org Artificial Intelligence
Standard attention mechanisms in transformers employ static token representations that remain unchanged across all pair-wise computations in each layer. This limits their representational alignment with the potentially diverse relational dynamics of each token-pair interaction. While they excel in domains with relatively homogeneous relationships, standard attention's static relational learning struggles to capture the diverse, heterogeneous inter-channel dependencies of multivariate time series (MTS) data--where different channel-pair interactions within a single system may be governed by entirely different physical laws or temporal dynamics. To better align the attention mechanism for such domain phenomena, we propose attention with dynamic relational priming (prime attention). Unlike standard attention where each token presents an identical representation across all of its pair-wise interactions, prime attention tailors each token dynamically (or per interaction) through learnable modulations to best capture the unique relational dynamics of each token pair, optimizing each pair-wise interaction for that specific relationship. This representational plasticity of prime attention enables effective extraction of relationship-specific information in MTS while maintaining the same asymptotic computational complexity as standard attention. Our results demonstrate that prime attention consistently outperforms standard attention across benchmarks, achieving up to 6.5% improvement in forecasting accuracy. In addition, we find that prime attention achieves comparable or superior performance using up to 40% less sequence length compared to standard attention, further demonstrating its superior relational modeling capabilities. An important challenge in applying transformers to multivariate time series (MTS) stems from domain mismatch. In language modeling, token relationships are predominantly semantic in nature, enabling most critical patterns to be captured by simple weighted sums of token representations. Similarly, in computer vision, spatial relationships dominate, enabling attention mechanisms to focus on regions of interest through uniform spatial reasoning. Learning on graphs exhibits comparable homogeneity, where node relationships are fundamentally structural and connectivity-based, allowing standard attention to model interactions through meaningful topological patterns (that are sometimes separated by relationship type (Schlichtkrull et al., 2018; Hu et al., 2020; Wang et al., 2019)). By static, we mean that token representations in each layer are fixed relative to all other tokens throughout pair-wise modeling. We classify this property of standard attention mechanisms as static relational learning.
arXiv.org Artificial Intelligence
Sep-16-2025
- Country:
- Europe > Italy
- Sardinia (0.04)
- North America > United States
- California > San Francisco County
- San Francisco (0.04)
- Texas > Dallas County
- Dallas (0.04)
- California > San Francisco County
- Europe > Italy
- Genre:
- Research Report > New Finding (0.86)
- Industry:
- Energy (1.00)
- Technology: