Data Mining enables users to analyse, classify and discover correlations among data. One of the crucial tasks of this process is Association Rule Learning. An important part of data mining is anomaly detection, which is a procedure of search for items or events that do not correspond to a familiar pattern. These familiar patterns are termed anomalies and interpret critical and actionable data in various application fields. This concept can be best understood with the supermarket example.
A number of data mining algorithms have been recently developed that greatly facilitate the processing and interpreting of large stores of data. One example is the association rule mining algorithm, which discovers correlations between items in transactional databases. The Alm'iori algorithm is an example association rule mining algorithm. Using this algorithm, candidate patterns which receive sufficient support (occur sufficiently often) from the database are considered for transformation into a rule. This type of algorithm works well for complete data with discrete values.
We propose scalable methods to execute counting queries in machine learning applications. To achieve memory and computational efficiency, we abstract counting queries and their context such that the counts can be aggregated as a stream. We demonstrate performance and scalability of the resulting approach on random queries, and through extensive experimentation using Bayesian networks learning and association rule mining. Our methods significantly outperform commonly used ADtrees and hash tables, and are practical alternatives for processing large-scale data.
Almost all the approaches in association rule mining suggested the use of a single minimum support, technique that either rules out all infrequent itemsets or suffers from the bottleneck of generating and examining too many candidate large itemsets. In this paper we consider the combination of two well-known algorithms, namely algorithm DIC and MSApriori in order to end up with a more effective and fast solution for mining association rules among items, with different support values. Experiments show that the new algorithm is better than algorithm MSApriori, as well as better than algorithm DIC.