In this blog post, I will give a brief overview of an important subfield of data mining that is called pattern mining. Pattern mining consists of using/developing data mining algorithms to discover interesting, unexpected and useful patterns in databases. Pattern mining algorithms can be applied on various types of data such as transaction databases, sequence databases, streams, strings, spatial data, graphs, etc. Pattern mining algorithms can be designed to discover various types of patterns: subgraphs, associations, indirect associations, trends, periodic patterns, sequential rules, lattices, sequential patterns, high-utility patterns, etc. But what is an interesting pattern? For example, some researchers define an interesting pattern as a pattern that appears frequently in a database.
We present a novel algorithm, Westfall-Young light, for detecting patterns, such as itemsets and subgraphs, which are statistically significantly enriched in one of two classes. Our method corrects rigorously for multiple hypothesis testing and correlations between patterns through the Westfall-Young permutation procedure, which empirically estimates the null distribution of pattern frequencies in each class via permutations. In our experiments, Westfall-Young light dramatically outperforms the current state-of-the-art approach in terms of both runtime and memory efficiency on popular real-world benchmark datasets for pattern mining. The key to this efficiency is that unlike all existing methods, our algorithm neither needs to solve the underlying frequent itemset mining problem anew for each permutation nor needs to store the occurrence list of all frequent patterns. Westfall-Young light opens the door to significant pattern mining on large datasets that previously led to prohibitive runtime or memory costs.
As a decision analyst, it is delightful to see all the excitement about the ever-increasing amounts of data available. But, when I see much of what data scientists are doing to find interesting patterns (descriptive analytics is just looking in the rear view mirros) and forecasting (predictive analytics, ideally our GPS), I see little attention paid to the real business decisions that might be informed by these insights (the decision analytics, helping our clients steer a better course). Decision analysis tells us that data, and the interesting patterns we may find in it, have zero practical value until they inform decisions made in the real world. Many of us give lip service to being in the business of decisions. But, how much of your professional work is about understanding and addressing real business decisions vs. just finding insights in the data?
The use of Advanced Analytics in business has rocketed to the top of every corporate agenda. It is the examination of data using sophisticated analytical methods and tools to generate new information, to recognise patterns and to predict outcomes and their respective probabilities. At ClearPeaks, we use analytical techniques like Regression, Forecasting, Clustering, Classification, Optimisation and Machine Learning to recognise patterns and translate predicted outcomes into business actions to optimise future results.
Module 3 consists of two lessons: Lessons 5 and 6. In Lesson 5, we discuss mining sequential patterns. We will learn several popular and efficient sequential pattern mining methods, including an Apriori-based sequential pattern mining method, GSP; a vertical data format-based sequential pattern method, SPADE; and a pattern-growth-based sequential pattern mining method, PrefixSpan. We will also learn how to directly mine closed sequential patterns. In Lesson 6, we will study concepts and methods for mining spatiotemporal and trajectory patterns as one kind of pattern mining applications.