Towards a Near Universal Time Series Data Mining Tool: Introducing the Matrix Profile
–arXiv.org Artificial Intelligence
Towards a Near Universal Time Series Data Mining Tool: Introducing the Matrix Profile by Chin-Chia Michael Yeh Doctor of Philosophy, Graduate Program in Computer Science University of California, Riverside, September 2018 Dr. Eamonn Keogh, Chairperson The last decade has seen a flurry of research on all-pairs-similarity-search (or, self-join) for text, DNA, and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. Surprisingly, however, little progress has been made on addressing this problem for time series subsequences. In this thesis, we have introduced a near universal time series data mining tool called matrix profile which solves the all-pairssimilarity-search problem and caches the output in an easy-to-access fashion. The proposed algorithm is not only parameter-free, exact and scalable, but also applicable for both single and multidimensional time series. By building time series data mining methods on top of matrix profile, many time series data mining tasks (e.g., motif discovery, discord discovery, shapelet discovery, semantic segmentation, and clustering) can be efficiently solved. Because the same matrix profile can be shared by a diverse set of time series data mining methods, matrix profile is versatile and computed-once-use-many-times data structure. We demonstrate the utility of matrix profile for many time series data mining problems, including motif discovery, discord discovery, weakly labeled time series classification, and vi representation learning on domains as diverse as seismology, entomology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring, and medicine. We hope the matrix profile is not the end but the beginning of many more time series data mining projects.
arXiv.org Artificial Intelligence
Nov-5-2018
- Country:
- Europe (0.27)
- Asia > Middle East (0.14)
- North America > United States
- Pennsylvania > Allegheny County
- Pittsburgh (0.13)
- California > Riverside County
- Riverside (0.24)
- Pennsylvania > Allegheny County
- Genre:
- Research Report > New Finding (0.45)
- Industry:
- Transportation (1.00)
- Media > Music (1.00)
- Materials (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Energy > Oil & Gas
- Upstream (0.87)
- Technology:
- Information Technology
- Data Science > Data Mining (1.00)
- Artificial Intelligence
- Representation & Reasoning > Search (0.87)
- Machine Learning
- Statistical Learning > Clustering (1.00)
- Performance Analysis > Accuracy (0.93)
- Inductive Learning (0.92)
- Learning Graphical Models (0.92)
- Neural Networks > Deep Learning (0.67)
- Information Technology