AITopics

Genre: Contests & Prizes (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Coulombe, Philippe Goulet

Slow-Growing Trees

arXiv.org Machine LearningMar-2-2021

Random Forest's performance can be matched by a single slow-growing tree (SGT), which uses a learning rate to tame CART's greedy algorithm. SGT exploits the view that CART is an extreme case of an iterative weighted least square procedure. Moreover, a unifying view of Boosted Trees (BT) and Random Forests (RF) is presented. Greedy ML algorithms' outcomes can be improved using either "slow learning" or diversification. SGT applies the former to estimate a single deep tree, and Booging (bagging stochastic BT with a high learning rate) uses the latter with additive shallow trees. The performance of this tree ensemble quaternity (Booging, BT, SGT, RF) is assessed on simulated and real regression tasks.

algorithm, goulet coulombe, sgt, (14 more...)

2103.01926

Country:

North America > United States > Ohio (0.04)
North America > United States > California (0.04)
North America > United States > Pennsylvania (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Banking & Finance > Economy (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)

Ghosal, Indrayudh, Hooker, Giles

Generalised Boosted Forests

arXiv.org Machine LearningMar-2-2021

This paper extends recent work on boosting random forests to model non-Gaussian responses. Given an exponential family $\mathbb{E}[Y|X] = g^{-1}(f(X))$ our goal is to obtain an estimate for $f$. We start with an MLE-type estimate in the link space and then define generalised residuals from it. We use these residuals and some corresponding weights to fit a base random forest and then repeat the same to obtain a boost random forest. We call the sum of these three estimators a \textit{generalised boosted forest}. We show with simulated and real data that both the random forest steps reduces test-set log-likelihood, which we treat as our primary metric. We also provide a variance estimator, which we can obtain with the same computational cost as the original estimate itself. Empirical experiments on real-world data and simulations demonstrate that the methods can effectively reduce bias, and that confidence interval coverage is conservative in the bulk of the covariate distribution.

random forest, response space, variance estimate, (13 more...)

2102.12561

Country: North America > United States > New York > Tompkins County > Ithaca (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

#artificialintelligenceFeb-28-2021, 22:10:15 GMT

The Glory of XGBoost

There are so many machine learning algorithms out there, how do you choose the best one for your problem? This question is going to have a different response based on the application and the data. Is it classification, regression, supervised, unsupervised, natural language processing, time series? There are so many avenues to take but in this article I am going to focus on on algorithm that I particularly find very interesting, XGBoost. XGBoost stands for extreme gradient boosting and is an open source library that provides an efficient and effective implementation of gradient boosting.

gradient, loss function, xgboost, (7 more...)

Industry: Leisure & Entertainment > Sports > Baseball (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

#artificialintelligenceFeb-28-2021, 08:15:24 GMT

Machine Learning May Reduce Mental Health Misdiagnosis

Depressive episodes in bipolar disorder can be indistinguishable from those in major depressive disorder, leading to misdiagnosis and poor subsequent outcomes. Approximately 40% of patients with bipolar disorder are initially diagnosed with major depressive disorder; average delay in bipolar diagnosis ranges from 5.7 to 7.5 years. In conjunction with data from self-reports and blood biomarker data, a machine learning algorithm called Extreme Gradient Boosting (XGBoost) was able to distinguish between bipolar disorder and major depressive disorder. The predictive capabilities of artificial intelligence (AI) can assist researchers and clinicians in disciplines characterized by complexity and nuance. AI machine learning is increasingly being used in life sciences, biotechnology, and mental health.

bipolar disorder, disorder, major depressive disorder, (11 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)

Genre: Research Report (0.31)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.58)

#artificialintelligenceFeb-27-2021, 04:50:32 GMT

Accurate classification of COVID‐19 patients with different severity via machine learning

Infection of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) could cause dramatic response in coronavirus disease 2019 (COVID‐19) patients at multi‐omics level,1-3 thus it is essential to systematically assess the pathogenesis of COVID‐19. In our previous study, we presented the first trans‐omics landscape of 236 COVID‐19 patients with 4 clinical severity groups (including asymptomatic, mild, severe and critically ill cases) and found that the mild and severe COVID‐19 patients shared several similar characteristics.4 However, it is crucial to discriminate mild from severe COVID‐19 patients to prevent the latter from the progression of disease by facilitating early intervention. Herein, we developed an extreme gradient boosting (XGBoost) machine‐learning model to predict the COVID‐19 severities by leveraging multi‐omics data. Briefly, we randomly stratified samples for the training set (80%) and the independent testing set (20%) (Figure 1A, see Methods in the Supporting Information).

omic feature, severity, xgboost model, (15 more...)

Genre:

Research Report > New Finding (0.49)
Research Report > Experimental Study (0.31)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.63)

Bénard, Clément, da Veiga, Sébastien, Scornet, Erwan

MDA for random forests: inconsistency, and a practical solution via the Sobol-MDA

arXiv.org Machine LearningFeb-26-2021

Variable importance measures are the main tools to analyze the black-box mechanism of random forests. Although the Mean Decrease Accuracy (MDA) is widely accepted as the most efficient variable importance measure for random forests, little is known about its theoretical properties. In fact, the exact MDA definition varies across the main random forest software. In this article, our objective is to rigorously analyze the behavior of the main MDA implementations. Consequently, we mathematically formalize the various implemented MDA algorithms, and then establish their limits when the sample size increases. In particular, we break down these limits in three components: the first two are related to Sobol indices, which are well-defined measures of a variable contribution to the output variance, widely used in the sensitivity analysis field, as opposed to the third term, whose value increases with dependence within input variables. Thus, we theoretically demonstrate that the MDA does not target the right quantity when inputs are dependent, a fact that has already been noticed experimentally. To address this issue, we define a new importance measure for random forests, the Sobol-MDA, which fixes the flaws of the original MDA. We prove the consistency of the Sobol-MDA and show its good empirical performance through experiments on both simulated and real data. An open source implementation in R and C++ is available online.

assumption, mda, random forest, (16 more...)

2102.13347

Country:

North America > United States > New York (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre:

Overview (0.67)
Research Report (0.49)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Transportation > Air (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceFeb-25-2021, 23:45:50 GMT

How to use PyCaret -- the library for lazy data scientists

When we approach supervised machine learning problems, it can be tempting to just see how a random forest or gradient boosting model performs and stop experimenting if we are satisfied with the results. What if you could compare many different models with just one line of code? What if you could reduce each step of the data science process from feature engineering to model deployment to just a few lines of code? This is exactly where PyCaret comes into play. PyCaret is a high-level, low-code Python library that makes it easy to compare, train, evaluate, tune, and deploy machine learning models with only a few lines of code.

library, model function, pycaret, (15 more...)

Country: North America > United States > California (0.05)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)

arXiv.org Machine LearningFeb-18-2021

Tree boosting for learning probability measures

Awaya, Naoki, Ma, Li

Learning probability measures based on an i.i.d. sample is a fundamental inference task, but is challenging when the sample space is high-dimensional. Inspired by the success of tree boosting in high-dimensional classification and regression, we propose a tree boosting method for learning high-dimensional probability distributions. We formulate concepts of "addition'' and "residuals'' on probability distributions in terms of compositions of a new, more general notion of multivariate cumulative distribution functions (CDFs) than classical CDFs. This then gives rise to a simple boosting algorithm based on forward-stagewise (FS) fitting of an additive ensemble of measures. The output of the FS algorithm allows analytic computation of the probability density function for the fitted distribution. It also provides an exact simulator for drawing independent Monte Carlo samples from the fitted measure. Typical considerations in applying boosting -- namely choosing the number of trees, setting the appropriate level of shrinkage/regularization in the weak learner, and the evaluation of variable importance -- can be accomplished in an analogous fashion to traditional boosting in supervised learning. Numerical experiments confirm that boosting can substantially improve the fit to multivariate distributions compared to the state-of-the-art single-tree learner and is computationally efficient. We illustrate through an application to a data set from mass cytometry how the simulator can be used to investigate various aspects of the underlying distribution.

algorithm, learner, probability measure, (15 more...)

2101.11083

Country: North America > United States > North Carolina > Durham County > Durham (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Artificial IntelligenceFeb-17-2021

BEDS: Bagging ensemble deep segmentation for nucleus segmentation with testing stage stain augmentation

Li, Xing, Yang, Haichun, He, Jiaxin, Jha, Aadarsh, Fogo, Agnes B., Wheless, Lee E., Zhao, Shilin, Huo, Yuankai

Reducing outcome variance is an essential task in deep learning based medical image analysis. Bootstrap aggregating, also known as bagging, is a canonical ensemble algorithm for aggregating weak learners to become a strong learner. Random forest is one of the most powerful machine learning algorithms before deep learning era, whose superior performance is driven by fitting bagged decision trees (weak learners). Inspired by the random forest technique, we propose a simple bagging ensemble deep segmentation (BEDs) method to train multiple U-Nets with partial training data to segment dense nuclei on pathological images. The contributions of this study are three-fold: (1) developing a self-ensemble learning framework for nucleus segmentation; (2) aggregating testing stage augmentation with self-ensemble learning; and (3) elucidating the idea that self-ensemble and testing stage stain augmentation are complementary strategies for a superior segmentation performance. Implementation Detail: https://github.com/xingli1102/BEDs.

augmentation, nucleus segmentation, segmentation, (13 more...)

arXiv.org Artificial Intelligence

2102.0899

Country: North America > United States > Tennessee > Davidson County > Nashville (0.06)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.47)
Health & Medicine > Diagnostic Medicine > Imaging (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)