AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

DriveML: An R Package for Driverless Machine Learning

Putatunda, Sayan, Ubrangala, Dayananda, Rama, Kiran, Kondapalli, Ravi

arXiv.org Machine LearningMay-1-2020

In recent years, the concept of automated machine learning has become very popular. Automated Machine Learning (AutoML) mainly refers to the automated methods for model selection and hyper-parameter optimization of various algorithms such as random forests, gradient boosting, neural networks, etc. In this paper, we introduce a new package i.e. DriveML for automated machine learning. DriveML helps in implementing some of the pillars of an automated machine learning pipeline such as automated data preparation, feature engineering, model building and model explanation by running the function instead of writing lengthy R codes. The DriveML package is available in CRAN. We compare the DriveML package with other relevant packages in CRAN/Github and find that DriveML performs the best across different parameters. We also provide an illustration by applying the DriveML package with default configuration on a real world dataset. Overall, the main benefits of DriveML are in development time savings, reduce developer's errors, optimal tuning of machine learning models and reproducibility.

dataset, driveml package, machine learning, (13 more...)

arXiv.org Machine Learning

2005.00478

Country:

Europe > Austria > Vienna (0.14)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > India (0.04)

Genre: Research Report (0.69)

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.51)

Add feedback

Why Your Company Needs White-Box Models in Enterprise Data Science - AI Trends

#artificialintelligenceApr-24-2020, 20:26:55 GMT

AI is having a profound impact on customer experience, revenue, operations, risk management and other business functions across multiple industries. When fully operationalized, AI and Machine Learning (ML) enable organizations to make data-driven decisions with unprecedented levels of speed, transparency, and accountability. This dramatically accelerates digital transformation initiatives delivering greater performance and a competitive edge to organizations. ML projects in data science labs tend to adopt black-box approaches that generate minimal actionable insights and result in a lack of accountability in the data-driven decision-making process. Today with the advent of AutoML 2.0 platforms, a white-box model approach is becoming increasingly important and possible.

automl 2, enterprise data science, transparency, (14 more...)

#artificialintelligence

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.73)

Add feedback

Sparse Oblique Decision Tree for Power System Security Rules Extraction and Embedding

Hou, Qingchun, Zhang, Ning, Kirschen, Daniel S., Du, Ershun, Cheng, Yaohua, Kang, Chongqing

arXiv.org Machine LearningApr-20-2020

Increasing the penetration of variable generation has a substantial effect on the operational reliability of power systems. The higher level of uncertainty that stems from this variability makes it more difficult to determine whether a given operating condition will be secure or insecure. Data-driven techniques provide a promising way to identify security rules that can be embedded in economic dispatch model to keep power system operating states secure. This paper proposes using a sparse weighted oblique decision tree to learn accurate, understandable, and embeddable security rules that are linear and can be extracted as sparse matrices using a recursive algorithm. These matrices can then be easily embedded as security constraints in power system economic dispatch calculations using the Big-M method. Tests on several large datasets with high renewable energy penetration demonstrate the effectiveness of the proposed method. In particular, the sparse weighted oblique decision tree outperforms the state-of-art weighted oblique decision tree while keeping the security rules simple. When embedded in the economic dispatch, these rules significantly increase the percentage of secure states and reduce the average solution time.

decision tree, oblique decision tree, security rule, (13 more...)

arXiv.org Machine Learning

2004.09579

Country:

North America > United States > Washington > King County > Seattle (0.14)
Asia > China > Chongqing Province > Chongqing (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry:

Machinery > Industrial Machinery (1.00)
Energy > Renewable > Wind (0.46)
Energy > Power Industry > Utilities (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Exploiting Categorical Structure Using Tree-Based Methods

Lucena, Brian

arXiv.org Artificial IntelligenceApr-15-2020

Standard methods of using categorical variables as predictors either endow them with an ordinal structure or assume they have no structure at all. However, categorical variables often possess structure that is more complicated than a linear ordering can capture. We develop a mathematical framework for representing the structure of categorical variables and show how to generalize decision trees to make use of this structure. This approach is applicable to methods such as Gradient Boosted Trees which use a decision tree as the underlying learner. We show results on weather data to demonstrate the improvement yielded by this approach.

decision tree, partition, terrain, (14 more...)

arXiv.org Artificial Intelligence

2004.07383

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > District of Columbia (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)

Add feedback

Diverse Instances-Weighting Ensemble based on Region Drift Disagreement for Concept Drift Adaptation

Liu, Anjin, Lu, Jie, Zhang, Guangquan

arXiv.org Machine LearningApr-13-2020

Concept drift refers to changes in the distribution of underlying data and is an inherent property of evolving data streams. Ensemble learning, with dynamic classifiers, has proved to be an efficient method of handling concept drift. However, the best way to create and maintain ensemble diversity with evolving streams is still a challenging problem. In contrast to estimating diversity via inputs, outputs, or classifier parameters, we propose a diversity measurement based on whether the ensemble members agree on the probability of a regional distribution change. In our method, estimations over regional distribution changes are used as instance weights. Constructing different region sets through different schemes will lead to different drift estimation results, thereby creating diversity. The classifiers that disagree the most are selected to maximize diversity. Accordingly, an instance-based ensemble learning algorithm, called the diverse instance weighting ensemble (DiwE), is developed to address concept drift for data stream classification problems. Evaluations of various synthetic and real-world data stream benchmarks show the effectiveness and advantages of the proposed algorithm.

algorithm, concept drift, dataset, (13 more...)

arXiv.org Machine Learning

doi: 10.1109/TNNLS.2020.2978523

2004.0581

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Brazil > Maranhão (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.67)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Learning under Concept Drift: A Review

Lu, Jie, Liu, Anjin, Dong, Fan, Gu, Feng, Gama, Joao, Zhang, Guangquan

arXiv.org Machine LearningApr-13-2020

Concept drift describes unforeseeable changes in the underlying distribution of streaming data over time. Concept drift research involves the development of methodologies and techniques for drift detection, understanding and adaptation. Data analysis has revealed that machine learning in a concept drift environment will result in poor learning results if the drift is not addressed. To help researchers identify which research topics are significant and how to apply related techniques in data analysis tasks, it is necessary that a high quality, instructive review of current research developments and trends in the concept drift field is conducted. In addition, due to the rapid development of concept drift in recent years, the methodologies of learning under concept drift have become noticeably systematic, unveiling a framework which has not been mentioned in literature. This paper reviews over 130 high quality publications in concept drift related research areas, analyzes up-to-date developments in methodologies and techniques, and establishes a framework of learning under concept drift including three main components: concept drift detection, concept drift understanding, and concept drift adaptation. This paper lists and discusses 10 popular synthetic datasets and 14 publicly available benchmark datasets used for evaluating the performance of learning algorithms aiming at handling concept drift. Also, concept drift related research directions are covered and discussed. By providing state-of-the-art knowledge, this survey will directly support researchers in their understanding of research developments in the field of learning under concept drift.

algorithm, concept drift, drift detection, (12 more...)

arXiv.org Machine Learning

doi: 10.1109/TKDE.2018.2876857

2004.05785

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > China > Beijing > Beijing (0.04)
(9 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.67)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)

Add feedback

Adversarial Attacks on Machine Learning Cybersecurity Defences in Industrial Control Systems

Anthi, Eirini, Williams, Lowri, Rhode, Matilda, Burnap, Pete, Wedgbury, Adam

arXiv.org Machine LearningApr-10-2020

The proliferation and application of machine learning based Intrusion Detection Systems (IDS) have allowed for more flexibility and efficiency in the automated detection of cyber attacks in Industrial Control Systems (ICS). However, the introduction of such IDSs has also created an additional attack vector; the learning models may also be subject to cyber attacks, otherwise referred to as Adversarial Machine Learning (AML). Such attacks may have severe consequences in ICS systems, as adversaries could potentially bypass the IDS. This could lead to delayed attack detection which may result in infrastructure damages, financial loss, and even loss of life. This paper explores how adversarial learning can be used to target supervised models by generating adversarial samples using the Jacobian-based Saliency Map attack and exploring classification behaviours. The analysis also includes the exploration of how such samples can support the robustness of supervised models using adversarial training. An authentic power system dataset was used to support the experiments presented herein. Overall, the classification performance of two widely used classifiers, Random Forest and J48, decreased by 16 and 20 percentage points when adversarial samples were present. Their performances improved following adversarial training, demonstrating their robustness towards such attacks.

adversarial sample, classifier, dataset, (16 more...)

arXiv.org Machine Learning

2004.05005

Country:

South America > Uruguay > Durazno > Durazno (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > Mississippi (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)

Add feedback

Feature Partitioning for Robust Tree Ensembles and their Certification in Adversarial Scenarios

Calzavara, Stefano, Lucchese, Claudio, Marcuzzi, Federico, Orlando, Salvatore

arXiv.org Machine LearningApr-7-2020

Machine learning algorithms, however effective, are known to be vulnerable in adversarial scenarios where a malicious user may inject manipulated instances. In this work we focus on evasion attacks, where a model is trained in a safe environment and exposed to attacks at test time. The attacker aims at finding a minimal perturbation of a test instance that changes the model outcome. We propose a model-agnostic strategy that builds a robust ensemble by training its basic models on feature-based partitions of the given dataset. Our algorithm guarantees that the majority of the models in the ensemble cannot be affected by the attacker. We experimented the proposed strategy on decision tree ensembles, and we also propose an approximate certification method for tree ensembles that efficiently assess the minimal accuracy of a forest on a given dataset avoiding the costly computation of evasion attacks. Experimental evaluation on publicly available datasets shows that proposed strategy outperforms state-of-the-art adversarial learning algorithms against evasion attacks.

artificial intelligence, attacker, machine learning, (15 more...)

arXiv.org Machine Learning

2004.03295

Country:

North America > United States (0.04)
Europe > Italy > Veneto > Venice (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

Adversarial Validation Approach to Concept Drift Problem in Automated Machine Learning Systems

Pan, Jing, Pham, Vincent, Dorairaj, Mohan, Chen, Huigang, Lee, Jeong-Yoon

arXiv.org Machine LearningApr-6-2020

In automated machine learning systems, concept drift in input data is one of the main challenges. It deteriorates model performance on new data over time. Previous research on concept drift mostly proposed model retraining after observing performance decreases. However, this approach is suboptimal because the system fixes the problem only after suffering from poor performance on new data. Here, we introduce an adversarial validation approach to concept drift problems in automated machine learning systems. With our approach, the system detects concept drift in new data before making inference, trains a model, and produces predictions adapted to the new data. We show that our approach addresses concept drift effectively with the AutoML3 Lifelong Machine Learning challenge data as well as in Uber's internal automated machine learning system, MaLTA.

classifier, dataset, validation, (12 more...)

arXiv.org Machine Learning

2004.03045

Country:

Europe > Middle East > Malta (0.27)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report > Experimental Study (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

FastForest: Increasing Random Forest Processing Speed While Maintaining Accuracy

Yates, Darren, Islam, Md Zahidul

arXiv.org Machine LearningApr-6-2020

Random Forest remains one of Data Mining's most enduring ensemble algorithms, achieving well-documented levels of accuracy and processing speed, as well as regularly appearing in new research. However, with data mining now reaching the domain of hardware-constrained devices such as smartphones and Internet of Things (IoT) devices, there is continued need for further research into algorithm efficiency to deliver greater processing speed without sacrificing accuracy. Our proposed FastForest algorithm delivers an average 24% increase in processing speed compared with Random Forest whilst maintaining (and frequently exceeding) it on classification accuracy over tests involving 45 datasets. FastForest achieves this result through a combination of three optimising components - Subsample Aggregating ('Subbagging'), Logarithmic Split-Point Sampling and Dynamic Restricted Subspacing. Moreover, detailed testing of Subbagging sizes has found an optimal scalar delivering a positive mix of processing performance and accuracy.

dataset, fastforest, random forest, (14 more...)

arXiv.org Machine Learning

2004.02423

Country:

Oceania > Australia (0.14)
Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)

Add feedback