Zhuang, Zhongfang
Error-bounded Approximate Time Series Joins Using Compact Dictionary Representations of Time Series
Yeh, Chin-Chia Michael, Zheng, Yan, Wang, Junpeng, Chen, Huiyuan, Zhuang, Zhongfang, Zhang, Wei, Keogh, Eamonn
The matrix profile is an effective data mining tool that provides similarity join functionality for time series data. Users of the matrix profile can either join a time series with itself using intra-similarity join (i.e., self-join) or join a time series with another time series using inter-similarity join. By invoking either or both types of joins, the matrix profile can help users discover both conserved and anomalous structures in the data. Since the introduction of the matrix profile five years ago, multiple efforts have been made to speed up the computation with approximate joins; however, the majority of these efforts only focus on self-joins. In this work, we show that it is possible to efficiently perform approximate inter-time series similarity joins with error bounded guarantees by creating a compact "dictionary" representation of time series. Using the dictionary representation instead of the original time series, we are able to improve the throughput of an anomaly mining system by at least 20X, with essentially no decrease in accuracy. As a side effect, the dictionaries also summarize the time series in a semantically meaningful way and can provide intuitive and actionable insights. We demonstrate the utility of our dictionary-based inter-time series similarity joins on domains as diverse as medicine and transportation.
FATA-Trans: Field And Time-Aware Transformer for Sequential Tabular Data
Zhang, Dongyu, Wang, Liang, Dai, Xin, Jain, Shubham, Wang, Junpeng, Fan, Yujie, Yeh, Chin-Chia Michael, Zheng, Yan, Zhuang, Zhongfang, Zhang, Wei
Sequential tabular data is one of the most commonly used data types in real-world applications. Different from conventional tabular data, where rows in a table are independent, sequential tabular data contains rich contextual and sequential information, where some fields are dynamically changing over time and others are static. Existing transformer-based approaches analyzing sequential tabular data overlook the differences between dynamic and static fields by replicating and filling static fields into each transformer, and ignore temporal information between rows, which leads to three major disadvantages: (1) computational overhead, (2) artificially simplified data for masked language modeling pre-training task that may yield less meaningful representations, and (3) disregarding the temporal behavioral patterns implied by time intervals. In this work, we propose FATA-Trans, a model with two field transformers for modeling sequential tabular data, where each processes static and dynamic field information separately. FATA-Trans is field- and time-aware for sequential tabular data. The field-type embedding in the method enables FATA-Trans to capture differences between static and dynamic fields. The time-aware position embedding exploits both order and time interval information between rows, which helps the model detect underlying temporal behavior in a sequence. Our experiments on three benchmark datasets demonstrate that the learned representations from FATA-Trans consistently outperform state-of-the-art solutions in the downstream tasks. We also present visualization studies to highlight the insights captured by the learned representations, enhancing our understanding of the underlying data. Our codes are available at https://github.com/zdy93/FATA-Trans.
Multitask Learning for Time Series Data with 2D Convolution
Yeh, Chin-Chia Michael, Dai, Xin, Zheng, Yan, Wang, Junpeng, Chen, Huiyuan, Fan, Yujie, Der, Audrey, Zhuang, Zhongfang, Wang, Liang, Zhang, Wei
Multitask learning (MTL) aims to develop a unified model that can handle a set of closely related tasks simultaneously. By optimizing the model across multiple tasks, MTL generally surpasses its non-MTL counterparts in terms of generalizability. Although MTL has been extensively researched in various domains such as computer vision, natural language processing, and recommendation systems, its application to time series data has received limited attention. In this paper, we investigate the application of MTL to the time series classification (TSC) problem. However, when we integrate the state-of-the-art 1D convolution-based TSC model with MTL, the performance of the TSC model actually deteriorates. By comparing the 1D convolution-based models with the Dynamic Time Warping (DTW) distance function, it appears that the underwhelming results stem from the limited expressive power of the 1D convolutional layers. To overcome this challenge, we propose a novel design for a 2D convolution-based model that enhances the model's expressiveness. Leveraging this advantage, our proposed method outperforms competing approaches on both the UCR Archive and an industrial transaction TSC dataset.
Toward a Foundation Model for Time Series Data
Yeh, Chin-Chia Michael, Dai, Xin, Chen, Huiyuan, Zheng, Yan, Fan, Yujie, Der, Audrey, Lai, Vivian, Zhuang, Zhongfang, Wang, Junpeng, Wang, Liang, Zhang, Wei
A foundation model is a machine learning model trained on a large and diverse set of data, typically using self-supervised learning-based pre-training techniques, that can be adapted to various downstream tasks. However, current research on time series pre-training has mostly focused on models pre-trained solely on data from a single domain, resulting in a lack of knowledge about other types of time series. However, current research on time series pre-training has predominantly focused on models trained exclusively on data from a single domain. As a result, these models possess domain-specific knowledge that may not be easily transferable to time series from other domains. In this paper, we aim to develop an effective time series foundation model by leveraging unlabeled samples from multiple domains. To achieve this, we repurposed the publicly available UCR Archive and evaluated four existing self-supervised learning-based pre-training methods, along with a novel method, on the datasets. We tested these methods using four popular neural network architectures for time series to understand how the pre-training methods interact with different network designs. Our experimental results show that pre-training improves downstream classification tasks by enhancing the convergence of the fine-tuning process. Furthermore, we found that the proposed pre-training method, when combined with the Transformer model, outperforms the alternatives.
An Efficient Content-based Time Series Retrieval System
Yeh, Chin-Chia Michael, Chen, Huiyuan, Dai, Xin, Zheng, Yan, Wang, Junpeng, Lai, Vivian, Fan, Yujie, Der, Audrey, Zhuang, Zhongfang, Wang, Liang, Zhang, Wei, Phillips, Jeff M.
A Content-based Time Series Retrieval (CTSR) system is an information retrieval system for users to interact with time series emerged from multiple domains, such as finance, healthcare, and manufacturing. For example, users seeking to learn more about the source of a time series can submit the time series as a query to the CTSR system and retrieve a list of relevant time series with associated metadata. By analyzing the retrieved metadata, users can gather more information about the source of the time series. Because the CTSR system is required to work with time series data from diverse domains, it needs a high-capacity model to effectively measure the similarity between different time series. On top of that, the model within the CTSR system has to compute the similarity scores in an efficient manner as the users interact with the system in real-time. In this paper, we propose an effective and efficient CTSR model that outperforms alternative models, while still providing reasonable inference runtimes. To demonstrate the capability of the proposed method in solving business problems, we compare it against alternative models using our in-house transaction data. Our findings reveal that the proposed model is the most suitable solution compared to others for our transaction data problem.
PDT: Pretrained Dual Transformers for Time-aware Bipartite Graphs
Dai, Xin, Fan, Yujie, Zhuang, Zhongfang, Jain, Shubham, Yeh, Chin-Chia Michael, Wang, Junpeng, Wang, Liang, Zheng, Yan, Aboagye, Prince Osei, Zhang, Wei
Pre-training on large models is prevalent and emerging Fundamentally, a common goal of data mining applications with the ever-growing user-generated content in many using user-content interactions is to understand machine learning application categories. It has been user's behaviors [17] and content's properties. Researchers recognized that learning contextual knowledge from attempt multiple ways to model such behaviors: the datasets depicting user-content interaction plays The time-related nature of the interactions is a fit for a vital role in downstream tasks. Despite several sequential models, such as recurrent neural networks studies attempting to learn contextual knowledge via pretraining (RNN), and the interactions and relations can be modeled methods, finding an optimal training objective as graph neural networks (GNN). Conventionally, and strategy for this type of task remains a challenging the training objective is to minimize the loss of a specific problem. In this work, we contend that there are two task such that one model is tailored to a particular distinct aspects of contextual knowledge, namely the application (e.g., recommendation). This approach is user-side and the content-side, for datasets where usercontent simple and effective for every data mining application.
Matrix Profile XXVII: A Novel Distance Measure for Comparing Long Time Series
Der, Audrey, Yeh, Chin-Chia Michael, Wu, Renjie, Wang, Junpeng, Zheng, Yan, Zhuang, Zhongfang, Wang, Liang, Zhang, Wei, Keogh, Eamonn
The most useful data mining primitives are distance measures. With an effective distance measure, it is possible to perform classification, clustering, anomaly detection, segmentation, etc. For single-event time series Euclidean Distance and Dynamic Time Warping distance are known to be extremely effective. However, for time series containing cyclical behaviors, the semantic meaningfulness of such comparisons is less clear. For example, on two separate days the telemetry from an athlete workout routine might be very similar. The second day may change the order in of performing push-ups and squats, adding repetitions of pull-ups, or completely omitting dumbbell curls. Any of these minor changes would defeat existing time series distance measures. Some bag-of-features methods have been proposed to address this problem, but we argue that in many cases, similarity is intimately tied to the shapes of subsequences within these longer time series. In such cases, summative features will lack discrimination ability. In this work we introduce PRCIS, which stands for Pattern Representation Comparison in Series. PRCIS is a distance measure for long time series, which exploits recent progress in our ability to summarize time series with dictionaries. We will demonstrate the utility of our ideas on diverse tasks and datasets.
Quantized Wasserstein Procrustes Alignment of Word Embedding Spaces
Aboagye, Prince O, Zheng, Yan, Yeh, Michael, Wang, Junpeng, Zhuang, Zhongfang, Chen, Huiyuan, Wang, Liang, Zhang, Wei, Phillips, Jeff
In natural language processing (NLP), the problem of aligning monolingual embedding spaces to induce a shared cross-lingual vector space has been shown not only to be useful in a variety of tasks such as bilingual lexicon induction (BLI) (Mikolov et al., 2013; Barone, 2016; Artetxe et al., 2017; Aboagye et al., 2022), machine translation (Artetxe et al., 2018b), cross-lingual information retrieval (Vulić & Moens, 2015), but it plays a crucial role in facilitating the cross-lingual transfer of language technologies from high resource languages to low resource languages. Cross-lingual word embeddings (CLWEs) represent words from two or more languages in a shared cross-lingual vector space in which words with similar meanings obtain similar vectors regardless of their language. There has been a flurry of work dominated by the so-called projection-based CLWE models (Mikolov et al., 2013; Artetxe et al., 2016, 2017, 2018a; Smith et al., 2017; Ruder et al., 2019), which aim to improve CLWE model performance significantly. Projection-based CLWE models learn a transfer function or mapper between two independently trained monolingual word vector spaces with limited or no cross-lingual supervision. Famous among projection-based CLWE models are the unsupervised projection-based CLWE models (Artetxe et al., 2017; Lample et al., 2018; Alvarez-Melis & Jaakkola, 2018;
One-Shot Learning on Attributed Sequences
Zhuang, Zhongfang, Kong, Xiangnan, Rundensteiner, Elke, Arora, Aditya, Zouaoui, Jihane
One-shot learning has become an important research topic in the last decade with many real-world applications. The goal of one-shot learning is to classify unlabeled instances when there is only one labeled example per class. Conventional problem setting of one-shot learning mainly focuses on the data that is already in feature space (such as images). However, the data instances in real-world applications are often more complex and feature vectors may not be available. In this paper, we study the problem of one-shot learning on attributed sequences, where each instance is composed of a set of attributes (e.g., user profile) and a sequence of categorical items (e.g., clickstream). This problem is important for a variety of real-world applications ranging from fraud prevention to network intrusion detection. This problem is more challenging than conventional one-shot learning since there are dependencies between attributes and sequences. We design a deep learning framework OLAS to tackle this problem. The proposed OLAS utilizes a twin network to generalize the features from pairwise attributed sequence examples. Empirical results on real-world datasets demonstrate the proposed OLAS can outperform the state-of-the-art methods under a rich variety of parameter settings.
Online Multi-horizon Transaction Metric Estimation with Multi-modal Learning in Payment Networks
Yeh, Chin-Chia Michael, Zhuang, Zhongfang, Wang, Junpeng, Zheng, Yan, Ebrahimi, Javid, Mercer, Ryan, Wang, Liang, Zhang, Wei
Predicting metrics associated with entities' transnational behavior within payment processing networks is essential for system monitoring. Multivariate time series, aggregated from the past transaction history, can provide valuable insights for such prediction. The general multivariate time series prediction problem has been well studied and applied across several domains, including manufacturing, medical, and entomology. However, new domain-related challenges associated with the data such as concept drift and multi-modality have surfaced in addition to the real-time requirements of handling the payment transaction data at scale. In this work, we study the problem of multivariate time series prediction for estimating transaction metrics associated with entities in the payment transaction database. We propose a model with five unique components to estimate the transaction metrics from multi-modality data. Four of these components capture interaction, temporal, scale, and shape perspectives, and the fifth component fuses these perspectives together. We also propose a hybrid offline/online training scheme to address concept drift in the data and fulfill the real-time requirements. Combining the estimation model with a graphical user interface, the prototype transaction metric estimation system has demonstrated its potential benefit as a tool for improving a payment processing company's system monitoring capability.