AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables

#artificialintelligenceOct-4-2019, 09:39:01 GMT

Random forest and similar Machine Learning techniques are already used to generate spatial predictions, but spatial location of points (geography) is often ignored in the modeling process. Spatial auto-correlation, especially if still existent in the cross-validation residuals, indicates that the predictions are maybe biased, and this is suboptimal. This paper presents a random forest for spatial predictions framework (RFsp) where buffer distances from observation points are used as explanatory variables, thus incorporating geographical proximity effects into the prediction process. The RFsp framework is illustrated with examples that use textbook datasets and apply spatial and spatio-temporal prediction to numeric, binary, categorical, multivariate and spatiotemporal variables. Performance of the RFsp framework is compared with the state-of-the-art kriging techniques using fivefold cross-validation with refitting.

prediction, random forest, spatial and spatio-temporal variable, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.86)

Add feedback

Using Machine Learning in Venture Capital

#artificialintelligenceOct-2-2019, 02:58:06 GMT

I have already (partially) reviewed previous studies where data have been proved to help identify signals that are relevant to assess the success potential of a startup. Even though the list is quite comprehensive, every study usually tends to look at one single factor and a couple of different success scenarios (namely, acquisition and IPO). In our work, we tried to have a more holistic view and use over 120,000 companies to spot signals not only for acquisitions and IPOs but also to compute the probability of raising a subsequent round of funding or shutting the startup down. In the same fashion as backtesting, we created a time-aware approach and analyzed companies that were no older than four years old by 2015 and tried to predict their success in the following three years. We also used more than a hundred variables as possible explanatory indicators of success, as well as five different models: Support Vector Machines (SVM); Decision Trees (DT); Random Forests (RF); Extremely Randomized Trees (ERT); and Gradient Tree Boosting (GTB).

acquisition and ipo, machine learning, venture capital, (2 more...)

#artificialintelligence

Industry: Banking & Finance > Capital Markets (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.61)

Add feedback

Identifying Cancer Patients at Risk for Heart Failure Using Machine Learning Methods

Yang, Xi, Gong, Yan, Waheed, Nida, March, Keith, Bian, Jiang, Hogan, William R., Wu, Yonghui

arXiv.org Machine LearningOct-1-2019

Cardiotoxicity related to cancer therapies has become a serious issue, diminishing cancer treatment outcomes and quality of life. Early detection of cancer patients at risk for cardiotoxicity before cardiotoxic treatments and providing preventive measures are potential solutions to improve cancer patients's quality of life. This study focuses on predicting the development of heart failure in cancer patients after cancer diagnoses using historical electronic health record (EHR) data. We examined four machine learning algorithms using 143,199 cancer patients from the University of Florida Health (UF Health) Integrated Data Repository (IDR). We identified a total number of 1,958 qualified cases and matched them to 15,488 controls by gender, age, race, and major cancer type. Two feature encoding strategies were compared to encode variables as machine learning features. The gradient boosting (GB) based model achieved the best AUC score of 0.9077 (with a sensitivity of 0.8520 and a specificity of 0.8138), outperforming other machine learning methods. We also looked into the subgroup of cancer patients with exposure to chemotherapy drugs and observed a lower specificity score (0.7089). The experimental results show that machine learning methods are able to capture clinical factors that are known to be associated with heart failure and that it is feasible to use machine learning methods to identify cancer patients at risk for cancer therapy-related heart failure.

cancer patient, heart failure, prediction, (13 more...)

arXiv.org Machine Learning

1910.00582

Country:

Europe (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Alachua County > Gainesville (0.04)
Asia > Middle East > Oman (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.67)

Add feedback

Machine Truth Serum

Luo, Tianyi, Liu, Yang

arXiv.org Artificial IntelligenceSep-27-2019

Wisdom of the crowd revealed a striking fact that the majority answer from a crowd is often more accurate than any individual expert. We observed the same story in machine learning--ensemble methods leverage this idea to combine multiple learning algorithms to obtain better classification performance. Among many popular examples is the celebrated Random Forest, which applies the majority voting rule in aggregating different decision trees to make the final prediction. Nonetheless, these aggregation rules would fail when the majority is more likely to be wrong. In this paper, we extend the idea proposed in Bayesian Truth Serum that "a surprisingly more popular answer is more likely the true answer" to classification problems. The challenge for us is to define or detect when an answer should be considered as being "surprising". We present two machine learning aided methods which aim to reveal the truth when it is minority instead of majority who has the true answer. Our experiments over real-world datasets show that better classification performance can be obtained compared to always trusting the majority voting. Our proposed methods also outperform popular ensemble algorithms. Our approach can be generically applied as a subroutine in ensemble methods to replace majority voting rule.

algorithm, classifier, machine truth serum, (14 more...)

arXiv.org Artificial Intelligence

1909.13004

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records

#artificialintelligenceSep-20-2019, 17:16:02 GMT

We used longitudinal data from linked electronic health records of 4.6 million patients aged 18–100 years from 389 practices across England between 1985 to 2015. The population was divided into a derivation cohort (80%, 3.75 million patients from 300 general practices) and a validation cohort (20%, 0.88 million patients from 89 general practices) from geographically distinct regions with different risk levels. We first replicated a previously reported Cox proportional hazards (CPH) model for prediction of the risk of the first emergency admission up to 24 months after baseline. This reference model was then compared with 2 machine learning models, random forest (RF) and gradient boosting classifier (GBC). The initial set of predictors for all models included 43 variables, including patient demographics, lifestyle factors, laboratory tests, currently prescribed medications, selected morbidities, and previous emergency admissions.

electronic health record, emergency admission, validation, (14 more...)

#artificialintelligence

Country: Europe > United Kingdom > England (0.26)

Genre:

Research Report > Strength Medium (0.50)
Research Report > Experimental Study (0.50)

Industry: Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.58)

Add feedback

Learning to Tune XGBoost with XGBoost

Sommer, Johanna, Sarigiannis, Dimitrios, Parnell, Thomas

arXiv.org Machine LearningSep-19-2019

In this short paper we investigate whether meta-learning techniques can be used to more effectively tune the hyperparameters of machine learning models using successive halving (SH). We propose a novel variant of the SH algorithm (MeSH), that uses meta-regressors to determine which candidate configurations should be eliminated at each round. We apply MeSH to the problem of tuning the hyperparameters of a gradient-boosted decision tree model. By training and tuning our meta-regressors using existing tuning jobs from 95 datasets, we demonstrate that MeSH can often find a superior solution to both SH and random search.

configuration, validation loss, xgboost, (16 more...)

arXiv.org Machine Learning

1909.07218

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
Europe > Germany > Baden-Württemberg (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

InterpretML: A Unified Framework for Machine Learning Interpretability

Nori, Harsha, Jenkins, Samuel, Koch, Paul, Caruana, Rich

arXiv.org Machine LearningSep-19-2019

InterpretML is an open-source Python package which exposes machine learning interpretability algorithms to practitioners and researchers. InterpretML exposes two types of interpretability - glassbox models, which are machine learning models designed for interpretability (ex: linear models, rule lists, generalized additive models), and blackbox explainability techniques for explaining existing systems (ex: Partial Dependence, LIME). The package enables practitioners to easily compare interpretability algorithms by exposing multiple methods under a unified API, and by having a built-in, extensible visualization platform. InterpretML also includes the first implementation of the Explainable Boosting Machine, a powerful, interpretable, glassbox model that can be as accurate as many blackbox models. The MIT licensed source code can be downloaded from github.com/microsoft/interpret.

algorithm, interpretml, prediction, (13 more...)

arXiv.org Machine Learning

1909.09223

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
(3 more...)

Genre: Research Report (0.65)

Industry: Health & Medicine > Therapeutic Area (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.50)

Add feedback

Voting with Random Classifiers (VORACE)

Cornelio, Cristina, Donini, Michele, Loreggia, Andrea, Pini, Maria Silvia, Rossi, Francesca

arXiv.org Artificial IntelligenceSep-18-2019

In many machine learning scenarios, looking for the best classifier that fits a particular dataset can be very costly in terms of time and resources. Moreover, it can require deep knowledge of the specific domain. We propose a new technique which does not require profound expertise in the domain and avoids the commonly used strategy of hyper-parameter tuning and model selection. Our method is an innovative ensemble technique that uses voting rules over a set of randomly-generated classifiers. Given a new input sample, we interpret the output of each classifier as a ranking over the set of possible classes. We then aggregate these output rankings using a voting rule, which treats them as preferences over the classes. We show that our approach obtains good results compared to the state-of-the-art, both providing a theoretical analysis and an empirical evaluation of the approach on several datasets.

artificial intelligence, classifier, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1909.08996

Country: Europe (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

8 Parameters to Qualify AI Solutions SalesChoice

#artificialintelligenceSep-17-2019, 18:42:21 GMT

One way could be to identify some of the most critical parameters to look for in any AI solution, and to rate/label them on a standard scale. Few such parameters are discussed below. Perhaps the community and policymakers can crystallize these further, and add to the list. Decision trees, Random forest, Gradient boosting, Monte Carlo, to name a few. The use of any one of these (say, Regression) in a solution can technically qualify it as AI-enabled, but it would not be very accurate or useful for a user. This has led to disillusionment among early AI users, while also giving rise to plethora of solutions and companies calling themselves AI.

artificial intelligence, decision tree learning, machine learning, (6 more...)

#artificialintelligence

Industry: Health & Medicine (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.56)

Add feedback

Many Heads Are Better Than One: The Case For Ensemble Learning

#artificialintelligenceSep-17-2019, 11:38:33 GMT

"The interests of truth require a diversity of opinions." Banks and lenders are increasingly turning to AI and machine learning to automate their core functions and make more accurate predictions in credit underwriting and fraud detection. ML practitioners can take advantage of a growing number of modeling algorithms, such as simple decision trees, random forests, gradient boosting machines, deep neural networks, and support vector machines. Each method has its strengths and weaknesses, which is why it often makes sense to combine ML algorithms to provide even greater predictive performance than any single ML method could provide on its own. This method of combining algorithms is known as ensembling.

ensemble, ensemble model, neural network, (14 more...)

#artificialintelligence

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.72)
(2 more...)

Add feedback