Towards Generalisable Time Series Understanding Across Domains

Turgut, Özgün, Müller, Philip, Menten, Martin J., Rueckert, Daniel

Oct-9-2024–arXiv.org Artificial Intelligence

In natural language processing and computer vision, self-supervised pre-training on large datasets unlocks foundational model capabilities across domains and tasks. However, this potential has not yet been realised in time series analysis, where existing methods disregard the heterogeneous nature of time series characteristics. Time series are prevalent in many domains, including medicine, engineering, natural sciences, and finance, but their characteristics vary significantly in terms of variate count, inter-variate relationships, temporal dynamics, and sampling frequency. This inherent heterogeneity across domains prevents effective pre-training on large time series corpora. To address this issue, we introduce OTiS, an open model for general time series analysis, that has been specifically designed to handle multi-domain heterogeneity. We propose a novel pre-training paradigm including a tokeniser with learnable domain-specific signatures, a dual masking strategy to capture temporal causality, and a normalised cross-correlation loss to model long-range dependencies. Our model is pre-trained on a large corpus of 640, 187 samples and 11 billion time points spanning 8 distinct domains, enabling it to analyse time series from any (unseen) domain. In comprehensive experiments across 15 diverse applications - including classification, regression, and forecasting - OTiS showcases its ability to accurately capture domain-specific data characteristics and demonstrates its competitiveness against state-of-the-art baselines. Our code and pre-trained weights are publicly available at https://github.com/oetu/otis. Self-supervised pre-training paradigms are designed to account for the specific properties of language (Radford et al., 2018; Touvron et al., 2023; Chowdhery et al., 2023) or imaging (Zhou et al., 2022; Cherti et al., 2023; Oquab et al., 2024), unlocking foundational model capabilities that apply to a wide range of domains and downstream tasks. This potential, however, remains largely unrealised in time series due to the lack of self-supervised pre-training paradigms that account for the heterogeneity of time series across domains.

dataset, time sery, variate, (15 more...)

arXiv.org Artificial Intelligence

Oct-9-2024

arXiv.org PDF

Add feedback

Country:
- Oceania
  - New Zealand (0.04)
  - Australia (0.04)
- North America
  - United States (0.28)
  - Canada (0.04)
- Europe
  - Switzerland (0.04)
  - United Kingdom > England
    - Greater London > London (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
- Asia
  - Singapore (0.04)
  - Japan (0.04)
  - China (0.04)

Genre:
- Research Report (0.64)
- Overview (0.46)

Industry:
- Banking & Finance (1.00)
- Health & Medicine
  - Diagnostic Medicine (1.00)
  - Health Care Technology (0.68)
  - Therapeutic Area
    - Psychiatry/Psychology (1.00)
    - Neurology (1.00)
    - Cardiology/Vascular Diseases (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.67)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found