AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Du, Qiming, Biau, Gérard, Petit, François, Porcher, Raphaël

Wasserstein Random Forests and Applications in Heterogeneous Treatment Effects

arXiv.org Machine LearningOct-23-2020

We present new insights into causal inference in the context of Heterogeneous Treatment Effects by proposing natural variants of Random Forests to estimate the key conditional distributions. To achieve this, we recast Breiman's original splitting criterion in terms of Wasserstein distances between empirical measures. This reformulation indicates that Random Forests are well adapted to estimate conditional distributions and provides a natural extension of the algorithm to multivariate outputs. Following the philosophy of Breiman's construction, we propose some variants of the splitting rule that are well-suited to the conditional distribution estimation problem. Some preliminary theoretical connections are established along with various numerical experiments, which show how our approach may help to conduct more transparent causal inference in complex situations.

artificial intelligence, estimation, machine learning, (18 more...)

2006.04709

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.82)

Tan, Chang Wei, Bergmeir, Christoph, Petitjean, Francois, Webb, Geoffrey I.

Time Series Extrinsic Regression

arXiv.org Machine LearningOct-20-2020

This paper studies Time Series Extrinsic Regression (TSER): a regression task of which the aim is to learn the relationship between a time series and a continuous scalar variable; a task closely related to time series classification (TSC), which aims to learn the relationship between a time series and a categorical class label. This task generalizes time series forecasting (TSF), relaxing the requirement that the value predicted be a future value of the input series or primarily depend on more recent values. In this paper, we motivate and study this task, and benchmark existing solutions and adaptations of TSC algorithms on a novel archive of 19 TSER datasets which we have assembled. Our results show that the state-of-the-art TSC algorithm Rocket, when adapted for regression, achieves the highest overall accuracy compared to adaptations of other TSC algorithms and state-of-the-art machine learning (ML) algorithms such as XGBoost, Random Forest and Support Vector Regression. More importantly, we show that much research is needed in this field to improve the accuracy of ML models. We also find evidence that further research has excellent prospects of improving upon these straightforward baselines.

algorithm, artificial intelligence, machine learning, (14 more...)

2006.12672

Country:

Europe > Italy (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Wisconsin (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

#artificialintelligenceOct-16-2020, 12:10:40 GMT

Fast Gradient Boosting with CatBoost - KDnuggets

artificial intelligence, catboost, machine learning, (9 more...)

Country: Africa (0.06)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

arXiv.org Machine LearningOct-16-2020

Causal Transfer Random Forest: Combining Logged Data and Randomized Experiments for Robust Prediction

Zeng, Shuxi, Bayir, Murat Ali, Pfeiffer, Joel, Charles, Denis, Kiciman, Emre

It is often critical for prediction models to be robust to distributional shifts between training and testing data. Viewed from a causal perspective, the challenge is to distinguish the stable causal relationships from the unstable spurious correlations across shifts. We describe a causal transfer random forest (CTRF) that combines existing training data with a small amount of data from a randomized experiment to train a model which is robust to the feature shifts and therefore transfers to a new targeting distribution. Theoretically, we justify the robustness of the approach against feature shifts with the knowledge from causal learning. Empirically, we evaluate the CTRF using both synthetic data experiments and real-world experiments in the Bing Ads platform, including a click prediction task and in the context of an end-to-end counterfactual optimization system. The proposed CTRF produces robust predictions and outperforms most baseline methods compared in the presence of feature shifts.

artificial intelligence, correlation, machine learning, (19 more...)

2010.0871

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Marketing (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

#artificialintelligenceOct-15-2020, 15:33:12 GMT

Random Forests

Random forests can also be used to identify likely fraudulent transactions. For example, each transaction in a bank has a series of features such as the deviation from the mean transaction volume of the customer, the time of day, the location, and how these values differ from that customer's usual habits. This allows a bank to build a sophisticated model to predict the likelihood of a given transaction being fraudulent. If the probability of fraud exceeds a threshold, such as 50%, the bank can take action, such as freezing the card.

artificial intelligence, decision tree learning, random forest, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)

Konstantinov, Andrei V., Utkin, Lev V.

Interpretable Machine Learning with an Ensemble of Gradient Boosting Machines

arXiv.org Machine LearningOct-14-2020

A method for the local and global interpretation of a black-box model on the basis of the well-known generalized additive models is proposed. It can be viewed as an extension or a modification of the algorithm using the neural additive model. The method is based on using an ensemble of gradient boosting machines (GBMs) such that each GBM is learned on a single feature and produces a shape function of the feature. The ensemble is composed as a weighted sum of separate GBMs resulting a weighted sum of shape functions which form the generalized additive model. GBMs are built in parallel using randomized decision trees of depth 1, which provide a very simple architecture. Weights of GBMs as well as features are computed in each iteration of boosting by using the Lasso method and then updated by means of a specific smoothing procedure. In contrast to the neural additive model, the method provides weights of features in the explicit form, and it is simply trained. A lot of numerical experiments with an algorithm implementing the proposed method on synthetic and real datasets demonstrate its efficiency and properties for local and global interpretation.

artificial intelligence, interpretation, machine learning, (17 more...)

2010.07388

Country:

Asia > Russia (0.14)
North America > United States > Wisconsin (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.90)

Mei, Jie, Desrosiers, Christian, Frasnelli, Johannes

Machine learning for the diagnosis of Parkinson's disease: A systematic review

arXiv.org Machine LearningOct-12-2020

Diagnosis of Parkinson's disease (PD) is commonly based on medical observations and assessment of clinical signs, including the characterization of a variety of motor symptoms. However, traditional diagnostic approaches may suffer from subjectivity as they rely on the evaluation of movements that are sometimes subtle to human eyes and therefore difficult to classify, leading to possible misclassification. In the meantime, early non-motor symptoms of PD may be mild and can be caused by many other conditions. Therefore, these symptoms are often overlooked, making diagnosis of PD at an early stage challenging. To address these difficulties and to refine the diagnosis and assessment procedures of PD, machine learning methods have been implemented for the classification of PD and healthy controls or patients with similar clinical presentations (e.g., movement disorders or other Parkinsonian syndromes). To provide a comprehensive overview of data modalities and machine learning methods that have been used in the diagnosis and differential diagnosis of PD, in this study, we conducted a systematic literature review of studies published until February 14, 2020, using the PubMed and IEEE Xplore databases. A total of 209 studies were included, extracted for relevant information and presented in this systematic review, with an investigation of their aims, sources of data, types of data, machine learning methods and associated outcomes. These studies demonstrate a high potential for adaptation of machine learning methods and novel biomarkers in clinical decision making, leading to increasingly systematic, informed diagnosis of PD.

artificial intelligence, fuzzy logic, machine learning, (18 more...)

2010.06101

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Quebec > Mauricie Region > Trois-Rivières (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
(10 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(4 more...)

Konstantinov, Andrei V., Utkin, Lev V.

A Generalized Stacking for Implementing Ensembles of Gradient Boosting Machines

arXiv.org Machine LearningOct-12-2020

The gradient boosting machine is one of the powerful tools for solving regression problems. In order to cope with its shortcomings, an approach for constructing ensembles of gradient boosting models is proposed. The main idea behind the approach is to use the stacking algorithm in order to learn a second-level meta-model which can be regarded as a model for implementing various ensembles of gradient boosting models. First, the linear regression of the gradient boosting models is considered as a simplest realization of the meta-model under condition that the linear model is differentiable with respect to its coefficients (weights). Then it is shown that the proposed approach can be simply extended on arbitrary differentiable combination models, for example, on neural networks which are differentiable and can implement arbitrary functions of gradient boosting models. Various numerical examples illustrate the proposed approach.

artificial intelligence, ensemble, machine learning, (15 more...)

2010.06026

Country:

Asia > Russia (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.90)

#artificialintelligenceOct-10-2020, 11:21:22 GMT

Random Forests Classifiers in Python

If you are not yet familiar with Tree-Based Models in Machine Learning, you should take a look at our R course on the subject. Let's understand the algorithm in layman's terms. Suppose you want to go on a trip and you would like to travel to a place which you will enjoy. So what do you do to find a place that you will like? You can search online, read reviews on travel blogs and portals, or you can also ask your friends.

artificial intelligence, decision tree learning, machine learning, (17 more...)

Country: North America > United States > Virginia (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.45)