In this blog post, I will give a brief overview of an important subfield of data mining that is called pattern mining. Pattern mining consists of using/developing data mining algorithms to discover interesting, unexpected and useful patterns in databases. Pattern mining algorithms can be applied on various types of data such as transaction databases, sequence databases, streams, strings, spatial data, graphs, etc. Pattern mining algorithms can be designed to discover various types of patterns: subgraphs, associations, indirect associations, trends, periodic patterns, sequential rules, lattices, sequential patterns, high-utility patterns, etc. But what is an interesting pattern? For example, some researchers define an interesting pattern as a pattern that appears frequently in a database.
The use of Advanced Analytics in business has rocketed to the top of every corporate agenda. It is the examination of data using sophisticated analytical methods and tools to generate new information, to recognise patterns and to predict outcomes and their respective probabilities. At ClearPeaks, we use analytical techniques like Regression, Forecasting, Clustering, Classification, Optimisation and Machine Learning to recognise patterns and translate predicted outcomes into business actions to optimise future results.
We present a novel algorithm, Westfall-Young light, for detecting patterns, such as itemsets and subgraphs, which are statistically significantly enriched in one of two classes. Our method corrects rigorously for multiple hypothesis testing and correlations between patterns through the Westfall-Young permutation procedure, which empirically estimates the null distribution of pattern frequencies in each class via permutations. In our experiments, Westfall-Young light dramatically outperforms the current state-of-the-art approach in terms of both runtime and memory efficiency on popular real-world benchmark datasets for pattern mining. The key to this efficiency is that unlike all existing methods, our algorithm neither needs to solve the underlying frequent itemset mining problem anew for each permutation nor needs to store the occurrence list of all frequent patterns. Westfall-Young light opens the door to significant pattern mining on large datasets that previously led to prohibitive runtime or memory costs.
Emojis have been widely used in textual communications as a new way to convey nonverbal cues. An interesting observation is the various emoji usage patterns among different users. In this paper, we investigate the correlation between user personality traits and their emoji usage patterns, particularly on overall amounts and specific preferences. To achieve this goal, we build a large Twitter dataset which includes 352,245 users and over 1.13 billion tweets associated with calculated personality traits and emoji usage patterns. Our correlation and emoji prediction results provide insights into the power of diverse personalities that lead to varies emoji usage patterns as well as its potential in emoji recommendation tasks.
As a decision analyst, it is delightful to see all the excitement about the ever-increasing amounts of data available. But, when I see much of what data scientists are doing to find interesting patterns (descriptive analytics is just looking in the rear view mirros) and forecasting (predictive analytics, ideally our GPS), I see little attention paid to the real business decisions that might be informed by these insights (the decision analytics, helping our clients steer a better course). Decision analysis tells us that data, and the interesting patterns we may find in it, have zero practical value until they inform decisions made in the real world. Many of us give lip service to being in the business of decisions. But, how much of your professional work is about understanding and addressing real business decisions vs. just finding insights in the data?