AITopics

Genre: Instructional Material > Course Syllabus & Notes (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

arXiv.org Machine LearningFeb-11-2020

Interpretable Companions for Black-Box Models

Pan, Danqing, Wang, Tong, Hara, Satoshi

We present an interpretable companion model for any pre-trained black-box classifiers. The idea is that for any input, a user can decide to either receive a prediction from the black-box model, with high accuracy but no explanations, or employ a companion rule to obtain an interpretable prediction with slightly lower accuracy. The companion model is trained from data and the predictions of the black-box model, with the objective combining area under the transparency--accuracy curve and model complexity. Our model provides flexible choices for practitioners who face the dilemma of choosing between always using interpretable models and always using black-box models for a predictive task, so users can, for any given input, take a step back to resort to an interpretable prediction if they find the predictive performance satisfying, or stick to the black-box model if the rules are unsatisfying. To show the value of companion models, we design a human evaluation on more than a hundred people to investigate the tolerable accuracy loss to gain interpretability for humans.

accuracy, black-box model, rule list, (14 more...)

2002.03494

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
North America > United States > Iowa (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)

#artificialintelligenceFeb-10-2020, 07:13:17 GMT

Ensemble Methods for Decision Trees

Decision Trees are popular Machine Learning algorithms used for both regression and classification tasks. Their popularity mainly arises from their interpretability and representability, as they mimic the way the human brain takes decisions. However, to be interpretable, they pay a price in terms of prediction accuracy. To overcome this caveat, some techniques have been developed, with the goal of creating strong and robust models starting from'poor' models. Those techniques are known as'ensemble' methods and, in this article, I'm going to talk about three of them: Bagging, Random Forest and Boosting.

dataset, decision tree, predictor, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.65)

He, Jingyu, Hahn, P. Richard

Stochastic tree ensembles for regularized nonlinear regression

arXiv.org Machine LearningFeb-9-2020

Tree-based algorithms for supervised learning, such as Classification and Regression Trees (CART) (Breiman et al., 1984), random forests (Breiman, 1996, 2001), adaBoost (Freund and Schapire, 1997), and gradient boosting (Breiman, 1997; Friedman, 2001, 2002), are widely used for applied supervised learning. As a whole, these methods are popular in applied settings due to their speed and accuracy in mean estimation and out-of-sample prediction tasks. One limitation of such methods is their well-known sensitivity to tuning parameters, which require costly cross-validation to optimize. Bayesian additive regression trees (BART) (Chipman et al., 2007, 2010) is a popular model-based alternative that is often more accurate than other treebased methods; specifically, BART boasts valuable robustness to the choice of tuning-parameters. However, relative to random forests and boosting, BART's wider adoption has been slowed by its more severe computational demands, owing to its reliance on a random walk Metropolis-Hastings Markov chain Monte Carlo (MCMC) algorithm. Despite this limitation, BART has inspired a considerable body of research in recent years.

algorithm, node, split criterion, (14 more...)

2002.03375

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California (0.04)
North America > United States > Arizona (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(2 more...)

#artificialintelligenceFeb-7-2020, 05:58:12 GMT

Machine Learning Tutorial Part 4 Machine Learning For Beginners - Python Decision Tree

Sign in to report inappropriate content. Learn how to implement a decision tree using the Python programming language. You will learn how to train and display you implementation of the decision tree classifier.

beginner, machine learning tutorial part 4, python decision tree

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceFeb-5-2020, 04:20:30 GMT

The Math of Decision Trees, Random Forest and Feature Importance in Scikit-learn and Spark

This post attempts to consolidate information on tree algorithms and their implementations in Scikit-learn and Spark. In particular, it was written to provide clarification on how feature importance is calculated.

decision tree, random forest and feature importance, scikit-learn and spark, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.85)

Wang, Sen, Chang, J. Morris

Privacy-Preserving Boosting in the Local Setting

arXiv.org Machine LearningFeb-5-2020

In machine learning, boosting is one of the most popular methods that designed to combine multiple base learners to a superior one. The well-known Boosted Decision Tree classifier, has been widely adopted in many areas. In the big data era, the data held by individual and entities, like personal images, browsing history and census information, are more likely to contain sensitive information. The privacy concern raises when such data leaves the hand of the owners and be further explored or mined. Such privacy issue demands that the machine learning algorithm should be privacy aware. Recently, Local Differential Privacy is proposed as an effective privacy protection approach, which offers a strong guarantee to the data owners, as the data is perturbed before any further usage, and the true values never leave the hands of the owners. Thus the machine learning algorithm with the private data instances is of great value and importance. In this paper, we are interested in developing the privacy-preserving boosting algorithm that a data user is allowed to build a classifier without knowing or deriving the exact value of each data samples. Our experiments demonstrate the effectiveness of the proposed boosting algorithm and the high utility of the learned classifiers.

algorithm, classifier, data owner, (16 more...)

2002.02096

Country:

North America > United States > Florida > Hillsborough County > Tampa (0.14)
South America > Brazil (0.05)
North America > United States > North Carolina (0.04)
(8 more...)

Genre: Research Report (0.83)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.90)
(2 more...)

Ju, Xiaomeng, Salibián-Barrera, Matías

Robust Boosting for Regression Problems

arXiv.org Machine LearningFeb-5-2020

The gradient boosting algorithm constructs a regression estimator using a linear combination of simple "base learners". In order to obtain a robust non-parametric regression estimator that is scalable to high dimensional problems we propose a robust boosting algorithm based on a two-stage approach, similar to what is done for robust linear regression: we first minimize a robust residual scale estimator, and then improve its efficiency by optimizing a bounded loss function. Unlike previous proposals, our algorithm does not need to compute an ad-hoc residual scale estimator in each step. Since our loss functions are typically non-convex, we propose initializing our algorithm with an $L_1$ regression tree, which is fast to compute. We also introduce a robust variable importance metric for variable selection that is calculated via a permutation procedure. Through simulated and real data experiments, we compare our method against gradient boosting with squared loss and other robust boosting methods in the literature. With clean data, our method works equally well as gradient boosting with the squared loss. With symmetric and asymmetrically contaminated data, we show that our proposed method outperforms in terms of prediction error and variable selection accuracy.

estimator, gradient, loss function, (17 more...)

2002.02054

Country:

Europe > Austria > Vienna (0.14)
Oceania > Australia > Tasmania (0.04)
North America > Canada > British Columbia (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

arXiv.org Artificial IntelligenceFeb-5-2020

A Survey on Causal Inference

Yao, Liuyi, Chu, Zhixuan, Li, Sheng, Li, Yaliang, Gao, Jing, Zhang, Aidong

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well known causal inference framework. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.

inference, propensity score, treatment effect, (16 more...)

arXiv.org Artificial Intelligence

2002.0277

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Tennessee (0.04)
North America > United States > New York > Erie County > Buffalo (0.04)
(6 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (1.00)
Information Technology (0.93)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
(5 more...)

Van Calster, Tine, Bossche, Filip Van den, Baesens, Bart, Lemahieu, Wilfried

Profit-oriented sales forecasting: a comparison of forecasting techniques from a business perspective

arXiv.org Machine LearningFeb-3-2020

Choosing the technique that is the best at forecasting your data, is a problem that arises in any forecasting application. Decades of research have resulted into an enormous amount of forecasting methods that stem from statistics, econometrics and machine learning (ML), which leads to a very difficult and elaborate choice to make in any forecasting exercise. This paper aims to facilitate this process for high-level tactical sales forecasts by comparing a large array of techniques for 35 times series that consist of both industry data from the Coca-Cola Company and publicly available datasets. However, instead of solely focusing on the accuracy of the resulting forecasts, this paper introduces a novel and completely automated profit-driven approach that takes into account the expected profit that a technique can create during both the model building and evaluation process. The expected profit function that is used for this purpose, is easy to understand and adaptable to any situation by combining forecasting accuracy with business expertise. Furthermore, we examine the added value of ML techniques, the inclusion of external factors and the use of seasonal models in order to ascertain which type of model works best in tactical sales forecasting. Our findings show that simple seasonal time series models consistently outperform other methodologies and that the profit-driven approach can lead to selecting a different forecasting model.

forecast, forecasting, forecasting technique, (16 more...)

2002.00949

Country:

Europe > United Kingdom (0.14)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > France (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.69)
Banking & Finance > Economy (0.68)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)