AITopics

2001.07787

Country:

North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Ontario > Middlesex County > London (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Information Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

arXiv.org Machine LearningJan-13-2020

Trees, forests, and impurity-based variable importance

Scornet, Erwan

Tree ensemble methods such as random forests [Breiman, 2001] are very popular to handle high-dimensional tabular data sets, notably because of their good predictive accuracy. However, when machine learning is used for decision-making problems, settling for the best predictive procedures may not be reasonable since enlightened decisions require an in-depth comprehension of the algorithm prediction process. Unfortunately, random forests are not intrinsically interpretable since their prediction results from averaging several hundreds of decision trees. A classic approach to gain knowledge on this so-called black-box algorithm is to compute variable importances, that are employed to assess the predictive impact of each input variable. Variable importances are then used to rank or select variables and thus play a great role in data analysis. Nevertheless, there is no justification to use random forest variable importances in such way: we do not even know what these quantities estimate. In this paper, we analyze one of the two well-known random forest variable importances, the Mean Decrease Impurity (MDI). We prove that if input variables are independent and in absence of interactions, MDI provides a variance decomposition of the output, where the contribution of each variable is clearly identified. We also study models exhibiting dependence between input variables or interaction, for which the variable importance is intrinsically ill-defined. Our analysis shows that there may exist some benefits to use a forest compared to a single tree.

mdi, theoretical tree, variance, (17 more...)

2001.04295

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Europe > France (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

#artificialintelligenceJan-12-2020, 18:22:40 GMT

An Introduction to Random Forest with Python and scikit-learn

NOTE: This post assumes basic understanding of decision trees. If you need to refresh how Decision Trees work, I recommend you to first read An Introduction to Decision Trees with Python and scikit-learn. The good thing about Random Forest is that if we understand Decision Trees very well, it should be very easy to understand Random Forest as well. The name Random Forest actually describes pretty well the extra features added. Firstly, we now have something that is random, which I'll explain more in depth.

decision tree, python and scikit-learn, random forest, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceJan-9-2020, 20:58:09 GMT

Converting Handwritten Math Symbols into Text Using Random Forest

The Inspiration: Is it fair to say mathematicians are averse to technology? My lifelong love for math inevitably led me to an undergraduate study in mathematics. Soon after taking my first college statistics course, I realized I also had a knack for understanding and interpreting data, as well as coding in the programming language R. After graduating with a Mathematics B.Sc., I became a high school teacher. Even though I can truly say I enjoyed what I did, I still felt the need to search for a more technically challenging career path.

equation, handwritten math symbol, mathematician, (4 more...)

Industry: Education > Educational Setting > K-12 Education (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.42)

arXiv.org Machine LearningJan-8-2020

A Comparative Study on Crime in Denver City Based on Machine Learning and Data Mining

Ratul, Md. Aminur Rab

To ensure the security of the general mass, crime prevention is one of the most higher priorities for any government. An accurate crime prediction model can help the government, law enforcement to prevent violence, detect the criminals in advance, allocate the government resources, and recognize problems causing crimes. To construct any future-oriented tools, examine and understand the crime patterns in the earliest possible time is essential. In this paper, I analyzed a real-world crime and accident dataset of Denver county, USA, from January 2014 to May 2019, which containing 478,578 incidents. This project aims to predict and highlights the trends of occurrence that will, in return, support the law enforcement agencies and government to discover the preventive measures from the prediction rates. At first, I apply several statistical analysis supported by several data visualization approaches. Then, I implement various classification algorithms such as Random Forest, Decision Tree, AdaBoost Classifier, Extra Tree Classifier, Linear Discriminant Analysis, K-Neighbors Classifiers, and 4 Ensemble Models to classify 15 different classes of crimes. The outcomes are captured using two popular test methods: train-test split, and k-fold cross-validation. Moreover, to evaluate the performance flawlessly, I also utilize precision, recall, F1-score, Mean Squared Error (MSE), ROC curve, and paired-T-test. Except for the AdaBoost classifier, most of the algorithms exhibit satisfactory accuracy. Random Forest, Decision Tree, Ensemble Model 1, 3, and 4 even produce me more than 90% accuracy. Among all the approaches, Ensemble Model 4 presented superior results for every evaluation basis. This study could be useful to raise the awareness of peoples regarding the occurrence locations and to assist security agencies to predict future outbreaks of violence in a specific area within a particular time.

algorithm, crime, dataset, (13 more...)

2001.02802

Country:

North America > United States > Colorado > Denver County (0.34)
Africa > Middle East > Libya (0.14)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.34)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.90)

Mamprin, Marco, Zelis, Jo M., Tonino, Pim A. L., Zinger, Svitlana, de With, Peter H. N.

Gradient Boosting on Decision Trees for Mortality Prediction in Transcatheter Aortic Valve Implantation

arXiv.org Machine LearningJan-8-2020

Current prognostic risk scores in cardiac surgery are based on statistics and do not yet benefit from machine learning. Statistical predictors are not robust enough to correctly identify patients who would benefit from Transcatheter Aortic Valve Implantation (TAVI). This research aims to create a machine learning model to predict one-year mortality of a patient after TAVI. We adopt a modern gradient boosting on decision trees algorithm, specifically designed for categorical features. In combination with a recent technique for model interpretations, we developed a feature analysis and selection stage, enabling to identify the most important features for the prediction. We base our prediction model on the most relevant features, after interpreting and discussing the feature analysis results with clinical experts. We validated our model on 270 TAVI cases, reaching an AUC of 0.83. Our approach outperforms several widespread prognostic risk scores, such as logistic EuroSCORE II, the STS risk score and the TAVI2-score, which are broadly adopted by cardiologists worldwide.

categorical feature, dataset, mortality, (13 more...)

2001.02431

Country: Europe > Netherlands > North Brabant > Eindhoven (0.05)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

#artificialintelligenceJan-4-2020, 05:52:05 GMT

Classification (Supervised Learning) In Data Mining

Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute. Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute. The set of tuples used for model construction: training(testing) set. The set of tuples used for model construction: training(testing) set. The model is represented as classification rules, decision trees, or statistical or mathematical formulae.

classification, distance measurement, higher-level concept, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.39)

Shaker, Mohammad Hossein, Hüllermeier, Eyke

Aleatoric and Epistemic Uncertainty with Random Forests

arXiv.org Machine LearningJan-3-2020

Due to the steadily increasing relevance of machine learning for practical applications, many of which are coming with safety requirements, the notion of uncertainty has received increasing attention in machine learning research in the last couple of years. In particular, the idea of distinguishing between two important types of uncertainty, often refereed to as aleatoric and epistemic, has recently been studied in the setting of supervised learning. In this paper, we propose to quantify these uncertainties with random forests. More specifically, we show how two general approaches for measuring the learner's aleatoric and epistemic uncertainty in a prediction can be instantiated with decision trees and random forests as learning algorithms in a classification setting. In this regard, we also compare random forests with deep neural networks, which have been used for a similar purpose.

epistemic uncertainty, prediction, random forest, (15 more...)

2001.00893

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
(2 more...)

arXiv.org Machine LearningJan-2-2020

Explainable outlier detection through decision tree conditioning

Cortes, David

This work describes an outlier-detection procedure that aims at pr oducing explanations for why an observation/point can be considered to be anomalous, w hich are obtained by finding smart conditional distributions of a given variable under which the anomalous observation/point in question would fall according to the conditions, b ut for which its value on a variable of interest would not match with the distribution of the o ther observations. These conditional distributions are obtained by splitting/separatin g/conditioning observations according to some other variable(s) in such a way that the in formation gain ([8]) in the variable of interest obtained by splitting the observations (as signing to two or more groups) is maximized, in a similar way as decision tree algorithms such as CART ([3]) or C5.0 ([8]), which ensure that the conditions that are set for a variable ar e not spurious, but rather related to the multivariate distribution of the data, and the anomalous value put into context by presenting key information about the variable's distribution among the rest of the observations. An example explainable outlier is sketc hed below: row [2230] - suspicious column: [T3] - suspicious vale: [10.

gritbot, outlier, outliertree, (15 more...)

2001.00636

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceDec-31-2019, 14:54:31 GMT

Machine Learning and Data Science Hands-on with Python and R

Learn from well designed, well-crafted study materials on Machine Learning ML, Statistics, Python, Artificial Intelligence AI, Tensorflow, AWS, Deep Learning, R Programming, NLP, Bayesian Methods, A/B Testing, Face Detection, Business Intelligence BI, Regression, Hypothesis Testing, Algebra, Adaboost Regressor, Gaussian, Heuristic, Numpy, Pandas, Metplotlit, Seaborn, Forecasting, Distribution, Normalization, Trend Analysis, Predictive Modeling, Fraud Detection, Neural Network, Sequential Model, Data Visualization, Data Analysis, Data Manipulation, KNN Algorithm, Decision Tree, Random Forests, Kmeans Clustering, Vector Machine, Time Series Analysis, Market Basket Analysis. Get the skills to work with implementations and develop capabilities that you can use to deliver results in a machine learning project. This program will help you build the foundation for a solid career in Machine learning Tools. Machine learning is a scientific discipline that explores the construction and study of algorithms that can learn from data. Such algorithms operate by building a model from example inputs and using that to make predictions or decisions, rather than following strictly static program instructions.

intelligence, learning and data science hand-on, machine learning, (10 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.51)

Industry: Education > Educational Setting > Online (0.80)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.56)