AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

benedekrozemberczki/awesome-gradient-boosting-papers

#artificialintelligenceNov-18-2019, 14:53:48 GMT

How to Make AdaBoost.M1 Work for Weak Base Classifiers by Changing Only One Line of the Code (ECML 2002)

algorithm, classification, learning, (14 more...)

#artificialintelligence

Industry:

Education (0.47)
Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(3 more...)

Add feedback

Fair Adversarial Gradient Tree Boosting

Grari, Vincent, Ruf, Boris, Lamprier, Sylvain, Detyniecki, Marcin

arXiv.org Artificial IntelligenceNov-18-2019

--Fair classification has become an important topic in machine learning research. While most bias mitigation strategies focus on neural networks, we noticed a lack of work on fair classifiers based on decision trees even though they have proven very efficient. In an up-to-date comparison of state-of- the-art classification algorithms in tabular data, tree boosting outperforms deep learning [1]. For this reason, we have developed a novel approach of adversarial gradient tree boosting. The objective of the algorithm is to predict the output Y with gradient tree boosting while minimizing the ability of an adversarial neural network to predict the sensitive attribute S . The approach incorporates at each iteration the gradient of the neural network directly in the gradient tree boosting. We empirically assess our approach on 4 popular data sets and compare against state-of- the-art algorithms. The results show that our algorithm achieves a higher accuracy while obtaining the same level of fairness, as measured using a set of different common fairness definitions. I NTRODUCTION Machine learning models are increasingly used in decision making processes. In many fields of application, they generally deliver superior performance compared with conventional, deterministic algorithms. However, those models are mostly black boxes which are hard, if not impossible, to interpret.

algorithm, classifier, fairness, (15 more...)

arXiv.org Artificial Intelligence

1911.05369

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.48)
Overview > Innovation (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.93)
Banking & Finance (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models

Lengerich, Benjamin, Tan, Sarah, Chang, Chun-Hao, Hooker, Giles, Caruana, Rich

arXiv.org Artificial IntelligenceNov-12-2019

Recent methods for training generalized additive models (GAMs) with pairwise interactions achieve state-of-the-art accuracy on a variety of datasets. Adding interactions to GAMs, however, introduces an identifiability problem: effects can be freely moved between main effects and interaction effects without changing the model predictions. In some cases, this can lead to contradictory interpretations of the same underlying function. This is a critical problem because a central motivation of GAMs is model interpretability. In this paper, we use the Functional ANOV A decomposition to uniquely define interaction effects and thus produce identifiable additive models with purified interactions. To compute this decomposition, we present a fast, exact, mass-moving algorithm that transforms any piecewise-constant function (such as a tree-based model) into a purified, canonical representation. We apply this algorithm to several datasets and show large disparity, including contradictions, between the apparent and the purified effects. An important question in data analysis is whether two variables act in concert to affect an outcome. But this unconstrained additive model has fundamental flaws.

interaction, interaction effect, main effect, (16 more...)

arXiv.org Artificial Intelligence

1911.04974

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
Pacific Ocean (0.04)
(5 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Machine learning identifies patients in need of end-of-life planning

#artificialintelligenceNov-11-2019, 17:21:19 GMT

Penn Medicine researchers have developed a machine learning algorithm that identifies oncology patients at risk of short-term mortality who need end-of-life conversations with clinicians. In a study of 26,525 patients receiving outpatient oncology care, the algorithm accurately predicted patients with cancer who were at risk of six-month mortality using electronic health records, including whether a patient had high blood pressure as well as laboratory and electrocardiogram data. The study found that 51 percent of the patients the algorithm identified as "high priority" for end-of-life conversations died within six months vs. fewer than 4 percent in the "lower priority" group. "Our findings suggest that ML tools hold promise for integration into clinical workflows to ensure that patients with cancer have timely conversations about their goals and values," concludes the study, which was published in the journal JAMA Network Open. Initially, researchers developed, validated and compared three ML models--gradient boosting, logistic regression and random forest--to estimate six-month mortality among patients seen in oncology clinics affiliated with a large academic cancer center. However, the random forest model in the study demonstrated the best predictive results.

algorithm, end-of-life planning, identify patient, (5 more...)

#artificialintelligence

Country: North America > United States > Pennsylvania (0.07)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.83)

Add feedback

Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success

#artificialintelligenceNov-11-2019, 03:18:18 GMT

The sustained success random forests has led naturally to the desire to better understand the statistical and mathematical properties of the procedure. Lin and Jeon (2006) introduced the potential nearest neighbor framework and Biau and Devroye (2010) later established related consistency properties. In the last several years, a number of important statistical properties of random forests have also been established whenever base learners are constructed with subsamples rather than bootstrap samples. Scornet et al. (2015) provided the first consistency result for Breiman's original random forest algorithm whenever the true underlying regression function is assumed to be additive. Despite the impressive volume of research from the past two decades and the exciting recent progress in establishing their statistical properties, a satisfying explanation for the sustained empirical success of random forests has yet to be provided.

procedure, random forest, selection procedure, (11 more...)

#artificialintelligence

AI-Alerts: 2019 > 2019-11 > AAAI AI-Alert for Nov 12, 2019 (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Privacy-Preserving Gradient Boosting Decision Trees

Li, Qinbin, Wu, Zhaomin, Wen, Zeyi, He, Bingsheng

arXiv.org Machine LearningNov-11-2019

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. \textit{Sensitivity} and \textit{privacy budget} are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.

gradient, privacy budget, sensitivity, (15 more...)

arXiv.org Machine Learning

1911.04209

Country:

Asia > Singapore (0.04)
Oceania > Australia > Western Australia (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Simplifying Random Forests: On the Trade-off between Interpretability and Accuracy

Rapp, Michael, Mencía, Eneldo Loza, Fürnkranz, Johannes

arXiv.org Machine LearningNov-11-2019

We analyze the trade-off between model complexity and accuracy for random forests by breaking the trees up into individual classification rules and selecting a subset of them. We show experimentally that already a few rules are sufficient to achieve an acceptable accuracy close to that of the original model. Moreover, our results indicate that in many cases, this can lead to simpler models that clearly outperform the original ones.

decision boundary, simplifying random forest, subset, (9 more...)

arXiv.org Machine Learning

1911.04393

Country: Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)

Genre:

Research Report > New Finding (0.51)
Research Report > Promising Solution (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)

Add feedback

Practical Federated Gradient Boosting Decision Trees

Li, Qinbin, Wen, Zeyi, He, Bingsheng

arXiv.org Machine LearningNov-11-2019

Gradient Boosting Decision Trees (GBDTs) have become very successful in recent years, with many awards in machine learning and data mining competitions. There have been several recent studies on how to train GBDTs in the federated learning setting. In this paper, we focus on horizontal federated learning, where data samples with the same features are distributed among multiple parties. However, existing studies are not efficient or effective enough for practical use. They suffer either from the inefficiency due to the usage of costly data transformations such as secure sharing and homomorphic encryption, or from the low model accuracy due to differential privacy designs. In this paper, we study a practical federated environment with relaxed privacy constraints. In this environment, a dishonest party might obtain some information about the other parties' data, but it is still impossible for the dishonest party to derive the actual raw data of other parties. Specifically, each party boosts a number of trees by exploiting similarity information based on locality-sensitive hashing. We prove that our framework is secure without exposing the original record to other parties, while the computation overhead in the training process is kept low. Our experimental studies show that, compared with normal training with the local data of each owner, our approach can significantly improve the predictive accuracy, and achieve comparable accuracy to the original GBDT with the data from all parties.

gradient, hash value, simfl, (15 more...)

arXiv.org Machine Learning

1911.04206

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Singapore (0.04)
Oceania > Australia > Western Australia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.62)

Add feedback

A Comprehensive Guide to Random Forest in R

#artificialintelligenceNov-9-2019, 02:07:23 GMT

Classification is the method of predicting the class of a given input data point. Classification problems are common in machine learning and they fall under the Supervised learning method.

algorithm, comprehensive guide, random forest

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)

Add feedback

Why You Should Build XGBoost Models Within H2O - Sefik Ilkin Serengil

#artificialintelligenceNov-7-2019, 15:10:43 GMT

XGBoost triggered the rise of the tree based models in the machine learning world. It earns reputation with its robust models. Its built models mostly get almost 2% more accuracy. On the other hand, it is a fact that XGBoost is almost 10 times slower than LightGBM. Speed means a lot in a data challenge.

building id, regular xgboost, xgboost, (11 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback