Goto

Collaborating Authors

Association Learning


Association Learning

#artificialintelligence

Association learning is a rule based machine learning and data mining technique that finds important relations between variables or features in a data set. Unlike conventional association algorithms measuring degrees of similarity, association rule learning identifies hidden correlations in databases by applying some measure of interestingness to generate an association rule for new searches.


SCR-Apriori for Mining `Sets of Contrasting Rules'

arXiv.org Machine Learning

--In this paper, we propose an efficient algorithm for mining novel'Set of Contrasting Rules'-pattern (SCR-pattern), which consists of several association rules. This pattern is of high interest due to the guaranteed quality of the rules forming it and its ability to discover useful knowledge. However, SCR-pattern has no efficient mining algorithm. We propose SCR-Apriori algorithm, which results in the same set of SCR-patterns as the state-of-the-art approache, but is less computationally expensive. We also show experimentally that by incorporating the knowledge about the pattern structure into Apriori algorithm, SCR-Apriori can significantly prune the search space of frequent itemsets to be analysed. I NTRODUCTION Association rules learning is a popular technique in data mining [1]. However, it is known that finding rules of high quality is not always an easy task [2]. This issue is even more significant in domains where the reliability of the obtained knowledge is required to be high (for example, in medicine). Also, association rules mining techniques usually generate a huge number of rules that have to be analysed by a human in order to choose meaningful and useful ones [3].


Fast Dimensional Analysis for Root Cause Investigation in Large-Scale Service Environment

arXiv.org Machine Learning

Root cause analysis in a large-scale production environment is challenging due to the complexity of services running across global data centers. Due to the distributed nature of a large-scale system, the various hardware, software, and tooling logs are often maintained separately, making it difficult to review the logs jointly for detecting issues. Another challenge in reviewing the logs for identifying issues is the scale - there could easily be millions of entities, each with hundreds of features. In this paper we present a fast dimensional analysis framework that automates the root cause analysis on structured logs with improved scalability. We first explore item-sets, i.e. a group of feature values, that could identify groups of samples with sufficient support for the target failures using the Apriori algorithm and a subsequent improvement, FP-Growth. These algorithms were designed for frequent item-set mining and association rule learning over transactional databases. After applying them on structured logs, we select the item-sets that are most unique to the target failures based on lift. With the use of a large-scale real-time database, we propose pre- and post-processing techniques and parallelism to further speed up the analysis. We have successfully rolled out this approach for root cause investigation purposes in a large-scale infrastructure. We also present the setup and results from multiple production use-cases in this paper.


101 Machine Learning Algorithms for Data Science with Cheat Sheets

#artificialintelligence

The algorithms have been sorted into 9 groups: Anomaly Detection, Association Rule Learning, Classification, Clustering, Dimensional Reduction, Ensemble, Neural Networks, Regression, Regularization. In this post, you'll find 101 machine learning algorithms, including useful infographics to help you know when to use each one (if available). Each of the accordian drop downs are embeddable if you want to take them with you. All you have to do is click the little'embed' button in the lower left hand corner and copy/paste the iframe. All we ask is you link back to this post.


RuDaS: Synthetic Datasets for Rule Learning and Evaluation Tools

arXiv.org Artificial Intelligence

Logical rules are a popular knowledge representation language in many domains, representing background knowledge and encoding information that can be derived from given facts in a compact form. However, rule formulation is a complex process that requires deep domain expertise, and is further challenged by today's often large, heterogeneous, and incomplete knowledge graphs. Several approaches for learning rules automatically, given a set of input example facts, have been proposed over time, including, more recently, neural systems. Yet, the area is missing adequate datasets and evaluation approaches: existing datasets often resemble toy examples that neither cover the various kinds of dependencies between rules nor allow for testing scalability. We present a tool for generating different kinds of datasets and for evaluating rule learning systems.


101 ML Algorithms

#artificialintelligence

The algorithms have been sorted into 9 groups: Anomaly Detection, Association Rule Learning, Classification, Clustering, Dimensional Reduction, Ensemble, Neural Networks, Regression, Regularization. In this post, you'll find 101 machine learning algorithms, including useful cheat sheets to help you know when to use each one (if available). At Data Science Dojo, our mission is to make data science (machine learning in this case) available to everyone. Whether you join our data science bootcamp, read our blog, or watch our tutorials, we want everyone to have the opportunity to learn data science. Having said that, each accordion dropdown is embeddable if you want to take them with you.


On the Trade-off Between Consistency and Coverage in Multi-label Rule Learning Heuristics

arXiv.org Machine Learning

Recently, several authors have advocated the use of rule learning algorithms to model multi-label data, as rules are interpretable and can be comprehended, analyzed, or qualitatively evaluated by domain experts. Many rule learning algorithms employ a heuristic-guided search for rules that model regularities contained in the training data and it is commonly accepted that the choice of the heuristic has a significant impact on the predictive performance of the learner. Whereas the properties of rule learning heuristics have been studied in the realm of single-label classification, there is no such work taking into account the particularities of multi-label classification. This is surprising, as the quality of multi-label predictions is usually assessed in terms of a variety of different, potentially competing, performance measures that cannot all be optimized by a single learner at the same time. In this work, we show empirically that it is crucial to trade off the consistency and coverage of rules differently, depending on which multi-label measure should be optimized by a model. Based on these findings, we emphasize the need for configurable learners that can flexibly use different heuristics. As our experiments reveal, the choice of the heuristic is not straight-forward, because a search for rules that optimize a measure locally does usually not result in a model that maximizes that measure globally.


Understanding Association Rule Learning & Its Role In Data Mining

#artificialintelligence

Data Mining enables users to analyse, classify and discover correlations among data. One of the crucial tasks of this process is Association Rule Learning. An important part of data mining is anomaly detection, which is a procedure of search for items or events that do not correspond to a familiar pattern. These familiar patterns are termed anomalies and interpret critical and actionable data in various application fields. This concept can be best understood with the supermarket example.


GuideR: a guided separate-and-conquer rule learning in classification, regression, and survival settings

arXiv.org Machine Learning

This article presents GuideR, a user-guided rule induction algorithm, which overcomes the largest limitation of the existing methods-the lack of the possibility to introduce user's preferences or domain knowledge to the rule learning process. Automatic selection of attributes and attribute ranges often leads to the situation in which resulting rules do not contain interesting information. We propose an induction algorithm which takes into account user's requirements. Our method uses the sequential covering approach and is suitable for classification, regression, and survival analysis problems. The effectiveness of the algorithm in all these tasks has been verified experimentally, confirming guided rule induction to be a powerful data analysis tool.


MachineX: Understanding FP-Tree Construction - DZone AI

#artificialintelligence

In my previous blog, MachineX: Why No One Uses an Apriori Algorithm for Association Rule Learning, we discussed one of the first algorithms in association rule learning, Apriori algorithm. Although, even after being so simple and clear, it has some weaknesses as discussed in the above-mentioned blog. A significant improvement over the Apriori algorithm is the FP-Growth algorithm. To understand how the FP-Growth algorithm helps in finding frequent items, we first have to understand the data structure used by it to do so, the FP-Tree, which will be our focus in this blog. To put it simply, an FP-Tree is a compressed representation of the input data.