AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

FairXGBoost: Fairness-aware Classification in XGBoost

Ravichandran, Srinivasan, Khurana, Drona, Venkatesh, Bharath, Edakunni, Narayanan Unny

arXiv.org Artificial IntelligenceSep-3-2020

Highly regulated domains such as finance have long favoured the use of machine learning algorithms that are scalable, transparent, robust and yield better performance. One of the most prominent examples of such an algorithm is XGBoost. Meanwhile, there is also a growing interest in building fair and unbiased models in these regulated domains and numerous bias-mitigation algorithms have been proposed to this end. However, most of these bias-mitigation methods are restricted to specific model families such as logistic regression or support vector machine models, thus leaving modelers with a difficult decision of choosing between fairness from the bias-mitigation algorithms and scalability, transparency, performance from algorithms such as XGBoost. We aim to leverage the best of both worlds by proposing a fair variant of XGBoost that enjoys all the advantages of XGBoost, while also matching the levels of fairness from the state-of-the-art bias-mitigation algorithms. Furthermore, the proposed solution requires very little in terms of changes to the original XGBoost library, thus making it easy for adoption. We provide an empirical analysis of our proposed method on standard benchmark datasets used in the fairness community.

artificial intelligence, dataset, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2009.01442

Country:

Asia > India > Karnataka > Bengaluru (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Bristol (0.04)

Genre: Research Report (0.90)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

Detecting Parkinson's Disease from Speech-task in an accessible and interpretable manner

Rahman, Wasifur, Lee, Sangwu, Islam, Md. Saiful, Mamun, Abdullah Al, Antony, Victor, Ratnu, Harshil, Ali, Mohammad Rafayet, Hoque, Ehsan

arXiv.org Machine LearningSep-2-2020

Every nine minutes a person is diagnosed with Parkinson's Disease (PD) in the United States. However, studies have shown that between 25 and 80\% of individuals with Parkinson's Disease (PD) remain undiagnosed. An online, in the wild audio recording application has the potential to help screen for the disease if risk can be accurately assessed. In this paper, we collect data from 726 unique subjects (262 PD and 464 Non-PD) uttering the "quick brown fox jumps over the lazy dog ...." to conduct automated PD assessment. We extracted both standard acoustic features and deep learning based embedding features from the speech data and trained several machine learning algorithms on them. Our models achieved 0.75 AUC by modeling the standard acoustic features through the XGBoost model. We also provide explanation behind our model's decision and show that it is focusing mostly on the widely used MFCC features and a subset of dysphonia features previously used for detecting PD from verbal phonation task.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Machine Learning

2009.01231

Country:

North America > United States (0.48)
Asia > India (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)

Add feedback

Improved Weighted Random Forest for Classification Problems

Shahhosseini, Mohsen, Hu, Guiping

arXiv.org Machine LearningSep-1-2020

Several studies have shown that combining machine learning models in an appropriate way will introduce improvements in the individual predictions made by the base models. The key to make well-performing ensemble model is in the diversity of the base models. Of the most common solutions for introducing diversity into the decision trees are bagging and random forest. Bagging enhances the diversity by sampling with replacement and generating many training data sets, while random forest adds selecting a random number of features as well. This has made the random forest a winning candidate for many machine learning applications. However, assuming equal weights for all base decision trees does not seem reasonable as the randomization of sampling and input feature selection may lead to different levels of decision-making abilities across base decision trees. Therefore, we propose several algorithms that intend to modify the weighting strategy of regular random forest and consequently make better predictions. The designed weighting frameworks include optimal weighted random forest based on ac-curacy, optimal weighted random forest based on the area under the curve (AUC), performance-based weighted random forest, and several stacking-based weighted random forest models. The numerical results show that the proposed models are able to introduce significant improvements compared to regular random forest.

artificial intelligence, machine learning, random forest, (16 more...)

arXiv.org Machine Learning

2009.00534

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

Random Forest (RF) Kernel for Regression, Classification and Survival

Feng, Dai, Baumgartner, Richard

arXiv.org Machine LearningAug-31-2020

Breiman's random forest (RF) can be interpreted as an implicit kernel generator,where the ensuing proximity matrix represents the data-driven RF kernel. Kernel perspective on the RF has been used to develop a principled framework for theoretical investigation of its statistical properties. However, practical utility of the links between kernels and the RF has not been widely explored and systematically evaluated.Focus of our work is investigation of the interplay between kernel methods and the RF. We elucidate the performance and properties of the data driven RF kernels used by regularized linear models in a comprehensive simulation study comprising of continuous, binary and survival targets. We show that for continuous and survival targets, the RF kernels are competitive to RF in higher dimensional scenarios with larger number of noisy features. For the binary target, the RF kernel and RF exhibit comparable performance. As the RF kernel asymptotically converges to the Laplace kernel, we included it in our evaluation. For most simulation setups, the RF and RFkernel outperformed the Laplace kernel. Nevertheless, in some cases the Laplace kernel was competitive, showing its potential value for applications. We also provide the results from real life data sets for the regression, classification and survival to illustrate how these insights may be leveraged in practice.Finally, we discuss further extensions of the RF kernels in the context of interpretable prototype and landmarking classification, regression and survival. We outline future line of research for kernels furnished by Bayesian counterparts of the RF.

artificial intelligence, decision tree learning, machine learning, (19 more...)

arXiv.org Machine Learning

2009.00089

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California (0.05)
North America > United States > Iowa (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.62)

Add feedback

MementoML: Performance of selected machine learning algorithm configurations on OpenML100 datasets

Kretowicz, Wojciech, Biecek, Przemysław

arXiv.org Machine LearningAug-30-2020

Finding optimal hyperparameters for the machine learning algorithm can often significantly improve its performance. But how to choose them in a time-efficient way? In this paper we present the protocol of generating benchmark data describing the performance of different ML algorithms with different hyperparameter configurations. Data collected in this way is used to study the factors influencing the algorithm's performance. This collection was prepared for the purposes of the study presented in the EPP study. We tested algorithms performance on dense grid of hyperparameters. Tested datasets and hyperparameters were chosen before any algorithm has run and were not changed. This is a different approach than the one usually used in hyperparameter tuning, where the selection of candidate hyperparameters depends on the results obtained previously. However, such selection allows for systematic analysis of performance sensitivity from individual hyperparameters. This resulted in a comprehensive dataset of such benchmarks that we would like to share. We hope, that computed and collected result may be helpful for other researchers. This paper describes the way data was collected. Here you can find benchmarks of 7 popular machine learning algorithms on 39 OpenML datasets. The detailed data forming this benchmark are available at: https://www.kaggle.com/mi2datalab/mementoml.

artificial intelligence, machine learning, openml, (16 more...)

arXiv.org Machine Learning

2008.13162

Country: Europe > Poland > Masovia Province > Warsaw (0.05)

Genre: Research Report (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.30)

Add feedback

Random Forest Vs XGBoost Tree Based Algorithms

#artificialintelligenceAug-29-2020, 04:12:06 GMT

In machine learning, we mainly deal with two kinds of problems that are classification and regression. There are several different types of algorithms for both tasks. But we need to pick that algorithm whose performance is good on the respective data. Ensemble methods like Random Forest, Decision Tree, XGboost algorithms have shown very good results when we talk about classification. These algorithms give high accuracy at fast speed.

algorithm, artificial intelligence, machine learning, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

agtboost: Adaptive and Automatic Gradient Tree Boosting Computations

Lunde, Berent Ånund Strømnes, Kleppe, Tore Selland

arXiv.org Machine LearningAug-28-2020

Gradient tree boosting (GTB) (Friedman 2001; Mason, Baxter, Bartlett, and Frean 1999) has risen to prominence for regression problems after the introduction of xgboost (Chen and Guestrin 2016). The GTB model is an ensemble-type model, that consist of classification and regression trees (CART) (Breiman, Friedman, Stone, and Olshen 1984) that are learned in an iterative manner. GTB models are very flexible in that they automatically learn nonlinear relationships and interaction effects. However, with the increased flexibility of GTB models comes substantial worries of overfitting. The top performing gradient tree boosting libraries, such as xgboost, LightGBM (Ke, Meng, Finley, Wang, Chen, Ma, Ye, and Liu 2017) and catboost (Dorogush, Ershov, and Gulin 2018), all come with a large number of hyperparameters available for manual tuning to constrain the complexity of the GTB models. Training of gradient tree boosting models, in general, thus require some familiarity with both the chosen package, and the data for efficient tuning and application to the problem at hand. The main focus of the hyperparameters and tuning are to solve the following problems: - The complexity of trees: What are the topology of all the different trees?

artificial intelligence, iteration, machine learning, (18 more...)

arXiv.org Machine Learning

2008.12625

Country:

Europe > Austria > Vienna (0.14)
Europe > Norway > Western Norway > Rogaland > Stavanger (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

How The New AI Model For Rapid COVID-19 Screening Works?

#artificialintelligenceAug-27-2020

With the current pandemic spreading like wildfire, the requirement for a faster diagnosis can not be more critical than now. As a matter of fact, the traditional real-time polymerase chain reaction testing (RT-PCR) using the nose and throat swab has not only been termed to have limited sensitivity but also time-consuming for operational reasons. Thus, to expedite the process of COVID-19 diagnosis, researchers from the University of Oxford developed two early-detection AI models leveraging the routine data collected from clinical reports. In a recent paper, the Oxford researchers revealed the two AI models and highlighted its effectiveness in screening the virus in patients coming for checkups to the hospital -- for an emergency checkup or for admitting in the hospital. To validate these real-time prediction models, researchers used primary clinical data, including lab tests of the patients, their vital signs and their blood reports.

artificial intelligence, blood test, machine learning, (13 more...)

#artificialintelligence

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.35)

Genre: Research Report > Experimental Study (0.39)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.35)

Add feedback

Machine learning for the selection of carbon-based materials for tetracycline and sulfamethoxazole adsorption

#artificialintelligenceAug-27-2020

Antiobiotics adsorption on carbon-based materials was modeled by machine learning. Random forest showed best prediction accuracy than GBT and ANN. Impact tendencies of SBET, pHsol, C0 on adsorption were similar for TC and SMX. Chemical compositions and pHpzc of CBMs showed different influences on TC and SMX. Antibiotics as emerging pollutants have attracted extensive attention due to their ecotoxicity and persistence in the environment.

adsorption, artificial intelligence, machine learning, (16 more...)

#artificialintelligence

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.85)
Water & Waste Management > Water Management (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.51)

Add feedback

Feature Selection Methods for Cost-Constrained Classification in Random Forests

Jagdhuber, Rudolf, Lang, Michel, Rahnenführer, Jörg

arXiv.org Machine LearningAug-17-2020

Cost-sensitive feature selection describes a feature selection problem, where features raise individual costs for inclusion in a model. These costs allow to incorporate disfavored aspects of features, e.g. failure rates of as measuring device, or patient harm, in the model selection process. Random Forests define a particularly challenging problem for feature selection, as features are generally entangled in an ensemble of multiple trees, which makes a post hoc removal of features infeasible. Feature selection methods therefore often either focus on simple pre-filtering methods, or require many Random Forest evaluations along their optimization path, which drastically increases the computational complexity. To solve both issues, we propose Shallow Tree Selection, a novel fast and multivariate feature selection method that selects features from small tree structures. Additionally, we also adapt three standard feature selection algorithms for cost-sensitive learning by introducing a hyperparameter-controlled benefit-cost ratio criterion (BCR) for each method. In an extensive simulation study, we assess this criterion, and compare the proposed methods to multiple performance-based baseline alternatives on four artificial data settings and seven real-world data settings. We show that all methods using a hyperparameterized BCR criterion outperform the baseline alternatives. In a direct comparison between the proposed methods, each method indicates strengths in certain settings, but no one-fits-all solution exists. On a global average, we could identify preferable choices among our BCR based methods. Nevertheless, we conclude that a practical analysis should never rely on a single method only, but always compare different approaches to obtain the best results.

artificial intelligence, machine learning, selection, (15 more...)

arXiv.org Machine Learning

2008.06298

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Wisconsin (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.84)

Add feedback