Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models

Liu, Xu, Aksu, Taha, Liu, Juncheng, Wen, Qingsong, Liang, Yuxuan, Xiong, Caiming, Savarese, Silvio, Sahoo, Doyen, Li, Junnan, Liu, Chenghao

arXiv.org Artificial Intelligence 

Time series analysis is crucial for understanding dynamics of complex systems. Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs), enabling generalized learning and integrating contextual information. However, their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints. Synthetic data emerge as a viable solution, addressing these challenges by offering scalable, unbiased, and high-quality alternatives. This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.