AITopics

1810.07287

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Firat, Murat, Crognier, Guillaume, Gabor, Adriana F., Zhang, Yingqian, Hurkens, C. A. J.

Constructing classification trees using column generation

arXiv.org Machine LearningOct-15-2018

In classification problems, the goal is to decide the class membership of a set of observations, by using available information on features and class membership of a training data set. Decision trees are one of the most popular models for solving this problem, due to their effectiveness and high interpretability. In this work, we focus on constructing univariate binary decision trees of prespecified depth. In a univariate binary decision tree, each internal node contains a test regarding the value of one single feature of the data set, while the leaves contain the target classes. The problem of constructing (learning) a classification tree (CTCP), is the problem of finding a set of optimal tests (decision checks), such that the assignment of target classes to rows satisfies a certain criteria. A commonly encountered objective is accuracy, measured as the number of correct predictions in a training set. As the problem of learning optimal decision trees is an NPcomplete problem (Hyafil and Rivest 1976), heuristics such as CART (Breiman et al. 1984) and ID3 (Quinlan 1986) are widely used. These greedy algorithms build a tree recursively, starting from a single node.

artificial intelligence, decision tree learning, machine learning, (18 more...)

1810.06684

Country: Europe (0.93)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceOct-11-2018, 14:19:33 GMT

Random Forests and the Bias-Variance Tradeoff – Towards Data Science

The Random Forest is an extremely popular machine learning algorithm. Often, with not too much pre-processing, one can throw together a quick and dirty model with no hyperparameter tuning and acheive results that aren't awful. As an example, I put together a RandomForestRegressor in Python using scikit-learn for the New York City Taxi Fare Prediction playground competition on Kaggle recently, passing in no arguments to the model constructor and using 1/100 for the training data (554238 of 55M rows), for a validation R² of 0.8. NOTE: This snippet assumes you split the data into training and validation sets with your features and target variable separated. You can see the full code on my GitHub profile.

artificial intelligence, machine learning, random forest, (7 more...)

Country: North America > United States > New York (0.26)

Industry:

Transportation > Passenger (0.57)
Transportation > Ground > Road (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.66)

#artificialintelligenceOct-11-2018, 03:13:47 GMT

A gentle introduction to decision trees using R

Most techniques of predictive analytics have their origins in probability or statistical theory (see my post on Naïve Bayes, for example). In this post I'll look at one that has more a commonplace origin: the way in which humans make decisions. When making decisions, we typically identify the options available and then evaluate them based on criteria that are important to us. The intuitive appeal of such a procedure is in no small measure due to the fact that it can be easily explained through a visual. The tree structure depicted here provides a neat, easy-to-follow description of the issue under consideration and its resolution.

artificial intelligence, decision tree learning, machine learning, (18 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

#artificialintelligenceOct-10-2018, 02:56:45 GMT

Glossary of Machine Learning Terms

ROC curves are widely used because they are relatively simple to understand and capture more than one aspect of the classification.

artificial intelligence, inductive learning, machine learning, (19 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
(2 more...)

Zhou, Zhengyuan, Athey, Susan, Wager, Stefan

Offline Multi-Action Policy Learning: Generalization and Optimization

arXiv.org Machine LearningOct-10-2018

In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action. Examples include selecting offers, prices, advertisements, or emails to send to consumers, as well as the problem of determining which medication to prescribe to a patient. While there is a growing body of literature devoted to this problem, most existing results are focused on the case where data comes from a randomized experiment, and further, there are only two possible actions, such as giving a drug to a patient or not. In this paper, we study the offline multi-action policy learning problem with observational data and where the policy may need to respect budget constraints or belong to a restricted policy class such as decision trees. We build on the theory of efficient semi-parametric inference in order to propose and implement a policy learning algorithm that achieves asymptotically minimax-optimal regret. To the best of our knowledge, this is the first result of this type in the multi-action setup, and it provides a substantial performance improvement over the existing learning algorithms. We then consider additional computational challenges that arise in implementing our method for the case where the policy is restricted to take the form of a decision tree. We propose two different approaches, one using a mixed integer program formulation and the other using a tree-search based algorithm.

algorithm, artificial intelligence, machine learning, (17 more...)

1810.04778

Country: North America > United States (0.67)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education > Educational Setting (0.92)
Education > Focused Education > Special Education (0.48)
Health & Medicine > Therapeutic Area > Oncology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Fitzsimons, Jack, Ali, AbdulRahman Al, Osborne, Michael, Roberts, Stephen

Equality Constrained Decision Trees: For the Algorithmic Enforcement of Group Fairness

arXiv.org Artificial IntelligenceOct-10-2018

Fairness, through its many forms and definitions, has become an important issue facing the machine learning community. In this work, we consider how to incorporate group fairness constraints in kernel regression methods. More specifically, we focus on examining the incorporation of these constraints in decision tree regression when cast as a form of kernel regression, with direct applications to random forests and boosted trees amongst other widespread popular inference techniques. We show that order of complexity of memory and computation is preserved for such models and bounds the expected perturbations to the model in terms of the number of leaves of the trees. Importantly, the approach works on trained models and hence can be easily applied to models in current use.

artificial intelligence, constraint, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1810.05041

Country: North America > United States (0.68)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Mistry, Miten, Letsios, Dimitrios, Krennrich, Gerhard, Lee, Robert M., Misener, Ruth

Mixed-Integer Convex Nonlinear Optimization with Gradient-Boosted Trees Embedded

arXiv.org Artificial IntelligenceOct-5-2018

Decision trees usefully represent sparse, high dimensional and noisy data. Having learned a function from this data, we may want to thereafter integrate the function into a larger decision-making problem, e.g., for picking the best chemical process catalyst. We study a large-scale, industrially-relevant mixed-integer nonlinear nonconvex optimization problem involving both gradient-boosted trees and penalty functions mitigating risk. This mixed-integer optimization problem with convex penalty terms broadly applies to optimizing pre-trained regression tree models. Decision makers may wish to optimize discrete models to repurpose legacy predictive models, or they may wish to optimize a discrete model that particularly well-represents a data set. We develop several heuristic methods to find feasible solutions, and an exact, branch-and-bound algorithm leveraging structural properties of the gradient-boosted trees and penalty functions. We computationally test our methods on concrete mixture design instance and a chemical catalysis industrial instance.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

1803.00952

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.50)

Industry: Materials > Chemicals (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(2 more...)

#artificialintelligenceOct-4-2018, 18:12:13 GMT

Is Machine Learning Analytics or AI? - International Institute for Analytics

One of the definitional debates that bedevils the artificial intelligence (AI) field is whether machine learning is an AI-based method or technology. Or is it just an analytics-based activity? After all, it is statistical in nature, and attempts--as virtually analytical methods do--to fit a line or curve to a set of data points. And what difference does it make? Basic machine learning is practically indistinguishable from predictive analytics.

artificial intelligence, decision tree learning, machine learning, (13 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.30)

arXiv.org Machine LearningOct-3-2018

Machine Learning Suites for Online Toxicity Detection

Noever, David

To identify and classify toxic online commentary, the modern tools of data science transform raw text into key features from which either thresholding or learning algorithms can make predictions for monitoring offensive conversations. We systematically evaluate 62 classifiers representing 19 major algorithmic families against features extracted from the Jigsaw dataset of Wikipedia comments. We compare the classifiers based on statistically significant differences in accuracy and relative execution time. Among these classifiers for identifying toxic comments, tree-based algorithms provide the most transparently explainable rules and rank-order the predictive contribution of each feature. Among 28 features of syntax, sentiment, emotion and outlier word dictionaries, a simple bad word list proves most predictive of offensive commentary. Introduction In 2015, the Twitter CEO, Dick Costello, took personal responsibility for online harassment, trolling and abuse on the Twitter platform.

artificial intelligence, machine learning, social media, (20 more...)

1810.01869

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)