GuideR: a guided separate-and-conquer rule learning in classification, regression, and survival settings
Sikora, Marek, Wróbel, Łukasz, Gudyś, Adam
GuideR: a guided separate-and-conquer rule learning in classification, regression, and survival settings Marek Sikora a,b,, Łukasz Wróbel a,b,, Adam Gudyś a, a Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland b Institute of Innovative Technologies, EMAG, Leopolda 31, 40-189 Katowice, PolandAbstract This article presents GuideR, a user-guided rule induction algorithm, which overcomes the largest limitation of the existing methods---the lack of the possibility to introduce user's preferences or domain knowledge to the rule learning process. Automatic selection of attributes and attribute ranges often leads to the situation in which resulting rules do not contain interesting information. We propose an induction algorithm which takes into account user's requirements. Our method uses the sequential covering approach and is suitable for classification, regression, and survival analysis problems. The effectiveness of the algorithm in all these tasks has been verified experimentally, confirming guided rule induction to be a powerful data analysis tool. Introduction Sequential covering rule induction algorithms can be used for both, predictive and descriptive purposes [1, 2, 3, 4]. In spite of the development of increasingly sophisticated versions of those algorithms [5, 6], the main principle remains unchanged and involves two phases: rule growing and rule pruning. In the latter, some of these conditions are removed. In comparison to other machine learning methods, rule sets obtained by sequential covering algorithm, also known as separate-and-conquer strategy (SnC), are characterized by good predictive as well as descriptive capabilities. Taking into consideration only the former, superior results can often be obtained using other methods, e.g. However, data models obtained this way are much less comprehensible than rule sets. In the case of rule learning for descriptive purposes, the algorithms of association rule induction [12, 13, 14] or subgroup discovery [15, 6], are applied. The former leads to a very large number of rules which must then be limited by filtering according to rule interestingness measures [16, 17, 18]. Nevertheless, rule sets obtained by subgroup discovery are characterized by worse predictive abilities than those generated by the standard sequential covering approach. Therefore, if creating a prediction system with comprehensible data model is the main objective, the application of sequential covering rule induction algorithms provides the most sensible solution.
Jun-5-2018
- Country:
- Asia > Singapore (0.04)
- Europe > Poland
- Silesia Province > Katowice (0.24)
- North America > United States
- California > San Francisco County
- San Francisco (0.04)
- Florida > Palm Beach County
- Boca Raton (0.04)
- New York (0.04)
- Texas (0.04)
- California > San Francisco County
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Hematology > Stem Cells (0.46)
- Oncology > Leukemia (0.46)
- Health & Medicine
- Technology: