Goto

Collaborating Authors

 sery


CosAE: Learnable Fourier Series for Image Restoration

Neural Information Processing Systems

In this paper, we introduce Cosine Autoencoder (CosAE), a novel, generic Autoencoder that seamlessly leverages the classic Fourier series with a feed-forward neural network. CosAE represents an input image as a series of 2D Cosine time series, each defined by a tuple of learnable frequency and Fourier coefficients. This method stands in contrast to a conventional Autoencoder that often sacrifices detail in their reduced-resolution bottleneck latent spaces.


SigTime: Learning and Visually Explaining Time Series Signatures

Huang, Yu-Chia, Chen, Juntong, Liu, Dongyu, Ma, Kwan-Liu

arXiv.org Machine Learning

Understanding and distinguishing temporal patterns in time series data is essential for scientific discovery and decision-making. For example, in biomedical research, uncovering meaningful patterns in physiological signals can improve diagnosis, risk assessment, and patient outcomes. However, existing methods for time series pattern discovery face major challenges, including high computational complexity, limited interpretability, and difficulty in capturing meaningful temporal structures. To address these gaps, we introduce a novel learning framework that jointly trains two Transformer models using complementary time series representations: shapelet-based representations to capture localized temporal structures and traditional feature engineering to encode statistical properties. The learned shapelets serve as interpretable signatures that differentiate time series across classification labels. Additionally, we develop a visual analytics system -- SigTIme -- with coordinated views to facilitate exploration of time series signatures from multiple perspectives, aiding in useful insights generation. We quantitatively evaluate our learning framework on eight publicly available datasets and one proprietary clinical dataset. Additionally, we demonstrate the effectiveness of our system through two usage scenarios along with the domain experts: one involving public ECG data and the other focused on preterm labor analysis.


Instruction-based Time Series Editing

Qiu, Jiaxing, Guo, Dongliang, Sullivan, Brynne, Henry, Teague R., Hartvigsen, Thomas

arXiv.org Artificial Intelligence

In time series editing, we aim to modify some properties of a given time series without altering others. For example, when analyzing a hospital patient's blood pressure, we may add a sudden early drop and observe how it impacts their future while preserving other conditions. Existing diffusion-based editors rely on rigid, predefined attribute vectors as conditions and produce all-or-nothing edits through sampling. This attribute- and sampling-based approach limits flexibility in condition format and lacks customizable control over editing strength. To overcome these limitations, we introduce Instruction-based Time Series Editing, where users specify intended edits using natural language. This allows users to express a wider range of edits in a more accessible format. We then introduce InstructTime, the first instruction-based time series editor. InstructTime takes in time series and instructions, embeds them into a shared multi-modal representation space, then decodes their embeddings to generate edited time series. By learning a structured multi-modal representation space, we can easily interpolate between embeddings to achieve varying degrees of edit. To handle local and global edits together, we propose multi-resolution encoders. In our experiments, we use synthetic and real datasets and find that InstructTime is a state-of-the-art time series editor: InstructTime achieves high-quality edits with controllable strength, can generalize to unseen instructions, and can be easily adapted to unseen conditions through few-shot learning.





ARIMA_PLUS: Large-scale, Accurate, Automatic and Interpretable In-Database Time Series Forecasting and Anomaly Detection in Google BigQuery

Cheng, Xi, Shen, Weijie, Chen, Haoming, Shen, Chaoyi, Ortega, Jean, Liu, Jiashang, Thomas, Steve, Zheng, Honglin, Wu, Haoyun, Li, Yuxiang, Lichtendahl, Casey, Ortiz, Jenny, Liu, Gang, Qi, Haiyang, Fatemieh, Omid, Fry, Chris, Long, Jing Jing

arXiv.org Artificial Intelligence

Time series forecasting and anomaly detection are common tasks for practitioners in industries such as retail, manufacturing, advertising and energy. Two unique challenges stand out: (1) efficiently and accurately forecasting time series or detecting anomalies in large volumes automatically; and (2) ensuring interpretability of results to effectively incorporate business insights. We present ARIMA_PLUS, a novel framework to overcome these two challenges by a unique combination of (a) accurate and interpretable time series models and (b) scalable and fully managed system infrastructure. The model has a sequential and modular structure to handle different components of the time series, including holiday effects, seasonality, trend, and anomalies, which enables high interpretability of the results. Novel enhancements are made to each module, and a unified framework is established to address both forecasting and anomaly detection tasks simultaneously. In terms of accuracy, its comprehensive benchmark on the 42 public datasets in the Monash forecasting repository shows superior performance over not only well-established statistical alternatives (such as ETS, ARIMA, TBATS, Prophet) but also newer neural network models (such as DeepAR, N-BEATS, PatchTST, TimeMixer). In terms of infrastructure, it is directly built into the query engine of BigQuery in Google Cloud. It uses a simple SQL interface and automates tedious technicalities such as data cleaning and model selection. It automatically scales with managed cloud computational and storage resources, making it possible to forecast 100 million time series using only 1.5 hours with a throughput of more than 18000 time series per second. In terms of interpretability, we present several case studies to demonstrate time series insights it generates and customizability it offers.


TimeWak: Temporal Chained-Hashing Watermark for Time Series Data

Soi, Zhi Wen, Zhu, Chaoyi, Abiad, Fouad, Shankar, Aditya, Galjaard, Jeroen M., Wang, Huijuan, Chen, Lydia Y.

arXiv.org Artificial Intelligence

Synthetic time series generated by diffusion models enable sharing privacy-sensitive datasets, such as patients' functional MRI records. Key criteria for synthetic data include high data utility and traceability to verify the data source. Recent watermarking methods embed in homogeneous latent spaces, but state-of-the-art time series generators operate in data space, making latent-based watermarking incompatible. This creates the challenge of watermarking directly in data space while handling feature heterogeneity and temporal dependencies. We propose TimeWak, the first watermarking algorithm for multivariate time series diffusion models. To handle temporal dependence and spatial heterogeneity, TimeWak embeds a temporal chained-hashing watermark directly within the temporal-feature data space. The other unique feature is the $ε$-exact inversion, which addresses the non-uniform reconstruction error distribution across features from inverting the diffusion process to detect watermarks. We derive the error bound of inverting multivariate time series while preserving robust watermark detectability. We extensively evaluate TimeWak on its impact on synthetic data quality, watermark detectability, and robustness under various post-editing attacks, against five datasets and baselines of different temporal lengths. Our results show that TimeWak achieves improvements of 61.96% in context-FID score, and 8.44% in correlational scores against the strongest state-of-the-art baseline, while remaining consistently detectable.



Mind the Missing: Variable-Aware Representation Learning for Irregular EHR Time Series using Large Language Models

Kwon, Jeong Eul, Yoon, Joo Heung, Lee, Hyo Kyung

arXiv.org Artificial Intelligence

Irregular sampling and high missingness are intrinsic challenges in modeling time series derived from electronic health records (EHRs), where clinical variables are measured at uneven intervals depending on workflow and intervention timing. To address this, we propose VITAL -- a variable-aware, large language model (LLM)-based framework tailored for learning from irregularly sampled physiological time series. VITAL differentiates between two distinct types of clinical variables: vital signs, which are frequently recorded and exhibit temporal patterns, and laboratory tests, which are measured sporadically and lack temporal structure. It reprograms vital signs into the language space, enabling the LLM to capture temporal context and reason over missing values through explicit encoding. In contrast, laboratory variables are embedded either using representative summary values or a learnable [Not measured] token, depending on their availability. Extensive evaluations on the benchmark datasets from the PhysioNet demonstrate that VITAL outperforms state-of-the-art methods designed for irregular time series. Furthermore, it maintains robust performance under high levels of missigness, which is prevalent in real-world clinical scenarios where key variables are often unavailable. Introduction Electronic Health Records (EHRs) digitally capture a wealth of patient data generated during routine clinical care. In particular, the Intensive Care Unit (ICU) is a data-rich environment due to the need for continuous, high-resolution patient monitoring. This has led to a surge of research in medical artificial intelligence (AI), with many studies leveraging publicly available EHR datasets in combination with machine learning techniques for tasks such as early warning, outcome prediction and patient stratification [1, 2, 3, 4, 5, 6, 7, 8, 9] A common approach in these studies is to model patient records as multivariate time series, capturing the temporal evolution of physiological and clinical variables. However, in practice, EHR time series are often irregularly sampled due to variations in clinical workflows, measurement protocols, and intervention timing.