Goto

Collaborating Authors

 ts data



ANT: Adaptive Noise Schedule for Time Series Diffusion Models

Neural Information Processing Systems

Advances in diffusion models for generative artificial intelligence have recently propagated to the time series (TS) domain, demonstrating state-of-the-art performance on various tasks. However, prior works on TS diffusion models often borrow the framework of existing works proposed in other domains without considering the characteristics of TS data, leading to suboptimal performance. In this work, wepropose Adaptive Noise schedule for Time series diffusion models (ANT), which automatically predetermines proper noise schedules for given TS datasets based on their statistics representing non-stationarity. Our intuition is that an optimal noise schedule should satisfy the following desiderata: 1) It linearly reduces the non-stationarity of TS data so that all diffusion steps are equally meaningful, 2) the data is corrupted to the random noise at the final step, and 3) the number of steps is sufficiently large. The proposed method is practical for use in that it eliminates the necessity of finding the optimal noise schedule with a small additional cost to compute the statistics for given datasets, which can be done offline before training.


Challenges and Limitations of Generative AI in Synthesizing Wearable Sensor Data

Di Martino, Flavio, Delmastro, Franca

arXiv.org Artificial Intelligence

The widespread adoption of wearable sensors has the potential to provide massive and heterogeneous time series data, driving the use of Artificial Intelligence in human sensing applications. However, data collection remains limited due to stringent ethical regulations, privacy concerns, and other constraints, hindering progress in the field. Synthetic data generation, particularly through Generative Adversarial Networks and Diffusion Models, has emerged as a promising solution to mitigate both data scarcity and privacy issues. However, these models are often limited to narrow operational scenarios, such as short-term and unimodal signal patterns. To address this gap, we present a systematic evaluation of state-of-the-art generative models for time series data, explicitly assessing their performance in challenging scenarios such as stress and emotion recognition. Our study examines the extent to which these models can jointly handle multi-modality, capture long-range dependencies, and support conditional generation-core requirements for real-world wearable sensor data generation. To enable a fair and rigorous comparison, we also introduce an evaluation framework that evaluates both the intrinsic fidelity of the generated data and their utility in downstream predictive tasks. Our findings reveal critical limitations in the existing approaches, particularly in maintaining cross-modal consistency, preserving temporal coherence, and ensuring robust performance in train-on-synthetic, test-on-real, and data augmentation scenarios. Finally, we present our future research directions to enhance synthetic time series generation and improve the applicability of generative models in the wearable computing domain.



ANT: Adaptive Noise Schedule for Time Series Diffusion Models

Neural Information Processing Systems

Advances in diffusion models for generative artificial intelligence have recently propagated to the time series (TS) domain, demonstrating state-of-the-art performance on various tasks. However, prior works on TS diffusion models often borrow the framework of existing works proposed in other domains without considering the characteristics of TS data, leading to suboptimal performance. In this work, wepropose Adaptive Noise schedule for Time series diffusion models (ANT), which automatically predetermines proper noise schedules for given TS datasets based on their statistics representing non-stationarity. Our intuition is that an optimal noise schedule should satisfy the following desiderata: 1) It linearly reduces the non-stationarity of TS data so that all diffusion steps are equally meaningful, 2) the data is corrupted to the random noise at the final step, and 3) the number of steps is sufficiently large. The proposed method is practical for use in that it eliminates the necessity of finding the optimal noise schedule with a small additional cost to compute the statistics for given datasets, which can be done offline before training. Code is available at this repository: https://github.com/seunghan96/ANT.


Addressing Challenges in Time Series Forecasting: A Comprehensive Comparison of Machine Learning Techniques

Mortezanejad, Seyedeh Azadeh Fallah, Wang, Ruochen

arXiv.org Artificial Intelligence

The explosion of Time Series (TS) data, driven by advancements in technology, necessitates sophisticated analytical methods. Modern management systems increasingly rely on analyzing this data, highlighting the importance of effcient processing techniques. State-of-the-art Machine Learning (ML) approaches for TS analysis and forecasting are becoming prevalent. This paper briefly describes and compiles suitable algorithms for TS regression task. We compare these algorithms against each other and the classic ARIMA method using diverse datasets: complete data, data with outliers, and data with missing values. The focus is on forecasting accuracy, particularly for long-term predictions. This research aids in selecting the most appropriate algorithm based on forecasting needs and data characteristics.


Multivariate Time Series Anomaly Detection in Industry 5.0

Colombi, Lorenzo, Vespa, Michela, Belletti, Nicolas, Brina, Matteo, Dahdal, Simon, Tabanelli, Filippo, Bellodi, Elena, Tortonesi, Mauro, Stefanelli, Cesare, Vignoli, Massimiliano

arXiv.org Artificial Intelligence

Industry 5.0 environments present a critical need for effective anomaly detection methods that can indicate equipment malfunctions, process inefficiencies, or potential safety hazards. The ever-increasing sensorization of manufacturing lines makes processes more observable, but also poses the challenge of continuously analyzing vast amounts of multivariate time series data. These challenges include data quality since data may contain noise, be unlabeled or even mislabeled. A promising approach consists of combining an embedding model with other Machine Learning algorithms to enhance the overall performance in detecting anomalies. Moreover, representing time series as vectors brings many advantages like higher flexibility and improved ability to capture complex temporal dependencies. We tested our solution in a real industrial use case, using data collected from a Bonfiglioli plant. The results demonstrate that, unlike traditional reconstruction-based autoencoders, which often struggle in the presence of sporadic noise, our embedding-based framework maintains high performance across various noise conditions.


Context-Alignment: Activating and Enhancing LLM Capabilities in Time Series

Hu, Yuxiao, Li, Qian, Zhang, Dongxiao, Yan, Jinyue, Chen, Yuntian

arXiv.org Artificial Intelligence

Recently, leveraging pre-trained Large Language Models (LLMs) for time series (TS) tasks has gained increasing attention, which involves activating and enhancing LLMs' capabilities. Many methods aim to activate LLMs' capabilities based on token-level alignment but overlook LLMs' inherent strength on natural language processing -- their deep understanding of linguistic logic and structure rather than superficial embedding processing. We propose Context-Alignment, a new paradigm that aligns TS with a linguistic component in the language environments familiar to LLMs to enable LLMs to contextualize and comprehend TS data, thereby activating their capabilities. Specifically, such context-level alignment comprises structural alignment and logical alignment, which is achieved by a Dual-Scale Context-Alignment GNNs (DSCA-GNNs) applied to TS-language multimodal inputs. Structural alignment utilizes dual-scale nodes to describe hierarchical structure in TS-language, enabling LLMs treat long TS data as a whole linguistic component while preserving intrinsic token features. Logical alignment uses directed edges to guide logical relationships, ensuring coherence in the contextual semantics. Demonstration examples prompt are employed to construct Demonstration Examples based Context-Alignment (DECA) following DSCA-GNNs framework. DECA can be flexibly and repeatedly integrated into various layers of pre-trained LLMs to improve awareness of logic and structure, thereby enhancing performance. Extensive experiments show the effectiveness of DECA and the importance of Context-Alignment across tasks, particularly in few-shot and zero-shot forecasting, confirming that Context-Alignment provide powerful prior knowledge on context.


Abstracted Shapes as Tokens -- A Generalizable and Interpretable Model for Time-series Classification

Wen, Yunshi, Ma, Tengfei, Weng, Tsui-Wei, Nguyen, Lam M., Julius, Anak Agung

arXiv.org Machine Learning

In time-series analysis, many recent works seek to provide a unified view and representation for time-series across multiple domains, leading to the development of foundation models for time-series data. Despite diverse modeling techniques, existing models are black boxes and fail to provide insights and explanations about their representations. In this paper, we present VQShape, a pre-trained, generalizable, and interpretable model for time-series representation learning and classification. By introducing a novel representation for time-series data, we forge a connection between the latent space of VQShape and shape-level features. Using vector quantization, we show that time-series from different domains can be described using a unified set of low-dimensional codes, where each code can be represented as an abstracted shape in the time domain. On classification tasks, we show that the representations of VQShape can be utilized to build interpretable classifiers, achieving comparable performance to specialist models. Additionally, in zero-shot learning, VQShape and its codebook can generalize to previously unseen datasets and domains that are not included in the pre-training process. The code and pre-trained weights are available at https://github.com/YunshiWen/VQShape.


Scalable Temporal Anomaly Causality Discovery in Large Systems: Achieving Computational Efficiency with Binary Anomaly Flag Data

Asres, Mulugeta Weldezgina, Omlin, Christian Walter, Collaboration, The CMS-HCAL

arXiv.org Machine Learning

Extracting anomaly causality facilitates diagnostics once monitoring systems detect system faults. Identifying anomaly causes in large systems involves investigating a more extensive set of monitoring variables across multiple subsystems. However, learning causal graphs comes with a significant computational burden that restrains the applicability of most existing methods in real-time and large-scale deployments. In addition, modern monitoring applications for large systems often generate large amounts of binary alarm flags, and the distinct characteristics of binary anomaly data -- the meaning of state transition and data sparsity -- challenge existing causality learning mechanisms. This study proposes an anomaly causal discovery approach (AnomalyCD), addressing the accuracy and computational challenges of generating causal graphs from binary flag data sets. The AnomalyCD framework presents several strategies, such as anomaly flag characteristics incorporating causality testing, sparse data and link compression, and edge pruning adjustment approaches. We validate the performance of this framework on two datasets: monitoring sensor data of the readout-box system of the Compact Muon Solenoid experiment at CERN, and a public data set for information technology monitoring. The results demonstrate the considerable reduction of the computation overhead and moderate enhancement of the accuracy of temporal causal discovery on binary anomaly data sets.