Pattern Recognition
Liu
Time series and signals are attracting more attention across statistics, machine learning and pattern recognition as it appears widely in the industry, especially in sensor and IoT related research and applications, but few advances has been achieved in effective time series visual analytics and interaction due to its temporal dimensionality and complex dynamics. Inspired by recent effort on using network metrics to characterize time series for classification, we present an approach to visualize time series as complex networks based on the first order Markov process in its temporal ordering. In contrast to the classical bar charts, line plots and other statistics based graph, our approach delivers more intuitive visualization that better preserves both the temporal dependency and frequency structures. It provides a natural inverse operation to map the graph back to raw signals, making it possible to use graph statistics to characterize time series for better visual exploration and statistical analysis. Our experimental results suggest the effectiveness on various tasks such as pattern discovery and classification on both synthetic and the real time series and sensor data.
Abou-Zleikha
We present a demonstration of PaTux, an authoring tool for designing levels in SuperTux game through combining patterns. PaTux allows game designers to specify the design of their levels using patterns extracted from training level samples. The Non-negative Matrix Factorisation (NMF) method is utilised to approximate pattern and weight matrices from the training data. The patterns are visualised for designers to choose from and the changes made on the level structure are visualised in realtime. The designer can also specify the weight of each pattern permitting exploration of a wider variety. The data used to train the model can also be specified by the designer resulting in learning a new set of patterns. The system also suggests variations for a given design. When the designer is satisfied with the design, the system allows loading the resultant level in the game to be played.
OPP-Miner: Order-preserving sequential pattern mining
Wu, Youxi, Hu, Qian, Li, Yan, Guo, Lei, Zhu, Xingquan, Wu, Xindong
A time series is a collection of measurements in chronological order. Discovering patterns from time series is useful in many domains, such as stock analysis, disease detection, and weather forecast. To discover patterns, existing methods often convert time series data into another form, such as nominal/symbolic format, to reduce dimensionality, which inevitably deviates the data values. Moreover, existing methods mainly neglect the order relationships between time series values. To tackle these issues, inspired by order-preserving matching, this paper proposes an Order-Preserving sequential Pattern (OPP) mining method, which represents patterns based on the order relationships of the time series data. An inherent advantage of such representation is that the trend of a time series can be represented by the relative order of the values underneath the time series data. To obtain frequent trends in time series, we propose the OPP-Miner algorithm to mine patterns with the same trend (sub-sequences with the same relative order). OPP-Miner employs the filtration and verification strategies to calculate the support and uses pattern fusion strategy to generate candidate patterns. To compress the result set, we also study finding the maximal OPPs. Experiments validate that OPP-Miner is not only efficient and scalable but can also discover similar sub-sequences in time series. In addition, case studies show that our algorithms have high utility in analyzing the COVID-19 epidemic by identifying critical trends and improve the clustering performance.
Mental Stress Detection using Data from Wearable and Non-wearable Sensors: A Review
Arsalan, Aamir, Anwar, Syed Muhammad, Majid, Muhammad
This paper presents a comprehensive review of methods covering significant subjective and objective human stress detection techniques available in the literature. The methods for measuring human stress responses could include subjective questionnaires (developed by psychologists) and objective markers observed using data from wearable and non-wearable sensors. In particular, wearable sensor-based methods commonly use data from electroencephalography, electrocardiogram, galvanic skin response, electromyography, electrodermal activity, heart rate, heart rate variability, and photoplethysmography both individually and in multimodal fusion strategies. Whereas, methods based on non-wearable sensors include strategies such as analyzing pupil dilation and speech, smartphone data, eye movement, body posture, and thermal imaging. Whenever a stressful situation is encountered by an individual, physiological, physical, or behavioral changes are induced which help in coping with the challenge at hand. A wide range of studies has attempted to establish a relationship between these stressful situations and the response of human beings by using different kinds of psychological, physiological, physical, and behavioral measures. Inspired by the lack of availability of a definitive verdict about the relationship of human stress with these different kinds of markers, a detailed survey about human stress detection methods is conducted in this paper. In particular, we explore how stress detection methods can benefit from artificial intelligence utilizing relevant data from various sources. This review will prove to be a reference document that would provide guidelines for future research enabling effective detection of human stress conditions.
Memory Efficient Tries for Sequential Pattern Mining
Hosseininasab, Amin, van Hoeve, Willem-Jan, Cire, Andre A.
Sequential Pattern Mining (SPM) is a prominent topic in unsupervised learning that aims at finding frequent patterns of events in sequential datasets. Frequent patterns have a wide range of applications and are used, for example, to develop novel association rules, aid supervised learners in prediction tasks, and design effective recommender systems. While supervised learning algorithms have enjoyed great success in using large-size datasets for better prediction accuracy, unsupervised algorithms such as SPM are still faced with challenges in scalability and memory requirement. In particular, the two dominant SPM methodologies, Apriori (Agrawal et al., 1994) and prefix-projection (Han et al., 2001), suffer from the explosion of candidate patterns or require to fit in memory the entire large-size training dataset. This memory bottleneck is aggravated by the steady increase of dataset size in recent years, which may contain a larger and richer set of frequent patterns to be investigated. It is thus vital for the success of SPM algorithms that they adapt to their rapidly growing data environment. This paper investigates the role of dataset models in the time and memory efficiency of SPM algorithms.
Tsetlin Machine for Solving Contextual Bandit Problems
Seraj, Raihan, Sharma, Jivitesh, Granmo, Ole-Christoffer
This paper introduces an interpretable contextual bandit algorithm using Tsetlin Machines, which solves complex pattern recognition tasks using propositional logic. The proposed bandit learning algorithm relies on straightforward bit manipulation, thus simplifying computation and interpretation. We then present a mechanism for performing Thompson sampling with Tsetlin Machine, given its non-parametric nature. Our empirical analysis shows that Tsetlin Machine as a base contextual bandit learner outperforms other popular base learners on eight out of nine datasets. We further analyze the interpretability of our learner, investigating how arms are selected based on propositional expressions that model the context.
Incremental Mining of Frequent Serial Episodes Considering Multiple Occurrence
Guyet, Thomas, Zhang, Wenbin, Bifet, Albert
The need to analyze information from streams arises in a variety of applications. One of the fundamental research directions is to mine sequential patterns over data streams. Current studies mine series of items based on the existence of the pattern in transactions but pay no attention to the series of itemsets and their multiple occurrences. The pattern over a window of itemsets stream and their multiple occurrences, however, provides additional capability to recognize the essential characteristics of the patterns and the inter-relationships among them that are unidentifiable by the existing items and existence based studies. In this paper, we study such a new sequential pattern mining problem and propose a corresponding efficient sequential miner with novel strategies to prune search space efficiently. Experiments on both real and synthetic data show the utility of our approach.
Signal Quality Assessment of Photoplethysmogram Signals using Quantum Pattern Recognition and lightweight CNN Architecture
Chatterjee, Tamaghno, Ghosh, Aayushman, Sarkar, Sayan
Photoplethysmography (PPG) signal comprises physiological information related to cardiorespiratory health. However, while recording, these PPG signals are easily corrupted by motion artifacts and body movements, leading to noise enriched, poor quality signals. Therefore ensuring high-quality signals is necessary to extract cardiorespiratory information accurately. Although there exists several rule-based and Machine-Learning (ML) - based approaches for PPG signal quality estimation, those algorithms' efficacy is questionable. Thus, this work proposes a lightweight CNN architecture for signal quality assessment employing a novel Quantum pattern recognition (QPR) technique. The proposed algorithm is validated on manually annotated data obtained from the University of Queensland database. A total of 28366, 5s signal segments are preprocessed and transformed into image files of 20 x 500 pixels. The image files are treated as an input to the 2D CNN architecture. The developed model classifies the PPG signal as `good' or `bad' with an accuracy of 98.3% with 99.3% sensitivity, 94.5% specificity and 98.9% F1-score. Finally, the performance of the proposed framework is validated against the noisy `Welltory app' collected PPG database. Even in a noisy environment, the proposed architecture proved its competence. Experimental analysis concludes that a slim architecture along with a novel Spatio-temporal pattern recognition technique improve the system's performance. Hence, the proposed approach can be useful to classify good and bad PPG signals for a resource-constrained wearable implementation.
Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge
Dognin, Pierre, Melnyk, Igor, Mroueh, Youssef, Padhi, Inkit, Rigotti, Mattia, Ross, Jarret, Schiff, Yair, Young, Richard A., Belgodere, Brian
Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of captioning systems in practical applications. However, the scarcity of data and contexts in many competition datasets renders the utility of systems trained on these datasets limited as an assistive technology in real-world settings, such as helping visually impaired people navigate and accomplish everyday tasks. This gap motivated the introduction of the novel VizWiz dataset, which consists of images taken by the visually impaired and captions that have useful, task-oriented information. In an attempt to help the machine learning computer vision field realize its promise of producing technologies that have positive social impact, the curators of the VizWiz dataset host several competitions, including one for image captioning. This work details the theory and engineering from our winning submission to the 2020 captioning competition. Our work provides a step towards improved assistive image captioning systems. This article appears in the special track on AI & Society.
Fuzzy Segmentations of a String
Kostanyan, Armen, Harmandayan, Arevik
This article discusses a particular case of the data clustering problem, where it is necessary to find groups of adjacent text segments of the appropriate length that match a fuzzy pattern represented as a sequence of fuzzy properties. To solve this problem, a heuristic algorithm for finding a sufficiently large number of solutions is proposed. The key idea of the proposed algorithm is the use of the prefix structure to track the process of mapping text segments to fuzzy properties. An important special case of the text segmentation problem is the fuzzy string matching problem, when adjacent text segments have unit length and, accordingly, the fuzzy pattern is a sequence of fuzzy properties of text characters. It is proven that the heuristic segmentation algorithm in this case finds all text segments that match the fuzzy pattern. Finally, we consider the problem of a best segmentation of the entire text based on a fuzzy pattern, which is solved using the dynamic programming method.