AITopics

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceSep-29-2020, 13:16:16 GMT

Back to Machine Learning Basics - Decision Tree & Random Forest

For example, if we have 43 instances of the training set in the node of which 13 belong to one class, while 30 instances belong to a second class. Given that we have only those two classes in the training dataset, we calculate Gini impurity 1 – (13/43)2 – (30/43)2 1 – 0.09 – 0.49 0.42. When the node is "pure" its Gini index is 0. On the other hand, information gain lets us find the best threshold which will reduce this impurity the most. To calculate information gain we need to calculate average impurity and then subtract that from the starting impurity. That is how we know the quality of thresholds that we used.

artificial intelligence, decision tree & random forest, decision tree learning, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.85)

arXiv.org Machine LearningSep-29-2020

On $\ell_p$-norm Robustness of Ensemble Stumps and Trees

Wang, Yihan, Zhang, Huan, Chen, Hongge, Boning, Duane, Hsieh, Cho-Jui

Recent papers have demonstrated that ensemble stumps and trees could be vulnerable to small input perturbations, so robustness verification and defense for those models have become an important research problem. However, due to the structure of decision trees, where each node makes decision purely based on one feature value, all the previous works only consider the $\ell_\infty$ norm perturbation. To study robustness with respect to a general $\ell_p$ norm perturbation, one has to consider the correlation between perturbations on different features, which has not been handled by previous algorithms. In this paper, we study the problem of robustness verification and certified defense with respect to general $\ell_p$ norm perturbations for ensemble decision stumps and trees. For robustness verification of ensemble stumps, we prove that complete verification is NP-complete for $p\in(0, \infty)$ while polynomial time algorithms exist for $p=0$ or $\infty$. For $p\in(0, \infty)$ we develop an efficient dynamic programming based algorithm for sound verification of ensemble stumps. For ensemble trees, we generalize the previous multi-level robustness verification algorithm to $\ell_p$ norm. We demonstrate the first certified defense method for training ensemble stumps and trees with respect to $\ell_p$ norm perturbations, and verify its effectiveness empirically on real datasets.

artificial intelligence, machine learning, optimization problem, (16 more...)

2008.08755

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

#artificialintelligenceSep-28-2020, 12:40:18 GMT

CHIRPS: Explaining random forest classification

Modern machine learning methods typically produce "black box" models that are opaque to interpretation. Yet, their demand has been increasing in the Human-in-the-Loop processes, that is, those processes that require a human agent to verify, approve or reason about the automated decisions before they can be applied. To facilitate this interpretation, we propose Collection of High Importance Random Path Snippets (CHIRPS); a novel algorithm for explaining random forest classification per data instance. CHIRPS extracts a decision path from each tree in the forest that contributes to the majority classification, and then uses frequent pattern mining to identify the most commonly occurring split conditions. Then a simple, conjunctive form rule is constructed where the antecedent terms are derived from the attributes that had the most influence on the classification.

classification, machine learning, pattern recognition, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.62)

#artificialintelligenceSep-28-2020, 05:40:35 GMT

Never Ignore these 5 Machine Learning Modeling Challenges

Okay, You have decided to build your own machine learning model. You are using Sklearn that is popular machine learning libraries for modeling. But wait do you know the common machine learning modeling challenges faced by every data scientist. No, then you have come to the right place. Here You will know each modeling challenges you face while building the model. When you have a categorical target dataset.

artificial intelligence, decision tree learning, machine learning, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)

Guillame-Bert, Mathieu, Bruch, Sebastian, Mitrichev, Petr, Mikheev, Petr, Pfeifer, Jan

Modeling Text with Decision Forests using Categorical-Set Splits

arXiv.org Machine LearningSep-28-2020

Decision forest algorithms model data by learning a binary tree structure recursively where every node splits the feature space into two regions, sending examples into the left or right branches. This "decision" is the result of the evaluation of a condition. For example, a node may split input data by applying a threshold to a numerical feature value. Such decisions are learned using (often greedy) algorithms that attempt to optimize a local loss function. Crucially, whether an algorithm exists to find and evaluate splits for a feature type (e.g., text) determines whether a decision forest algorithm can model that feature type at all. In this work, we set out to devise such an algorithm for textual features, thereby equipping decision forests with the ability to directly model text without the need for feature transformation. Our algorithm is efficient during training and the resulting splits are fast to evaluate with our extension of the QuickScorer inference algorithm. Experiments on benchmark text classification datasets demonstrate the utility and effectiveness of our proposal.

artificial intelligence, machine learning, natural language, (16 more...)

2009.09991

Country:

North America > United States (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.91)
(2 more...)

#artificialintelligenceSep-24-2020, 18:25:19 GMT

Great Machine Learning Project For Beginners – Predict NBA Player Position

So now that we've covered the basics of machine learning with regression models, let's move onto something a little more sophisticated: Decision Trees. What is a decision tree you ask? A decision tree is a set of questions you can ask to classify different data points. It's called a tree because it's in a tree like shape, just inverted. If you've got the weather forecast for the day, it'd be pretty easy to look at it and determine if you'd want to go play tennis that day.

artificial intelligence, decision tree learning, machine learning, (13 more...)

Industry: Leisure & Entertainment > Sports > Basketball (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Soliman, Marwah, Lyubchich, Vyacheslav, Gel, Yulia R.

Ensemble Forecasting of the Zika Space-TimeSpread with Topological Data Analysis

arXiv.org Machine LearningSep-24-2020

As per the records of theWorld Health Organization, the first formally reported incidence of Zika virus occurred in Brazil in May 2015. The disease then rapidly spread to other countries in Americas and East Asia, affecting more than 1,000,000 people. Zika virus is primarily transmitted through bites of infected mosquitoes of the species Aedes (Aedes aegypti and Aedes albopictus). The abundance of mosquitoes and, as a result, the prevalence of Zika virus infections are common in areas which have high precipitation, high temperature, and high population density.Nonlinear spatio-temporal dependency of such data and lack of historical public health records make prediction of the virus spread particularly challenging. In this article, we enhance Zika forecasting by introducing the concepts of topological data analysis and, specifically, persistent homology of atmospheric variables, into the virus spread modeling. The topological summaries allow for capturing higher order dependencies among atmospheric variables that otherwise might be unassessable via conventional spatio-temporal modeling approaches based on geographical proximity assessed via Euclidean distance. We introduce a new concept of cumulative Betti numbers and then integrate the cumulative Betti numbers as topological descriptors into three predictive machine learning models: random forest, generalized boosted regression, and deep neural network. Furthermore, to better quantify for various sources of uncertainties, we combine the resulting individual model forecasts into an ensemble of the Zika spread predictions using Bayesian model averaging. The proposed methodology is illustrated in application to forecasting of the Zika space-time spread in Brazil in the year 2018.

artificial intelligence, machine learning, zika virus, (14 more...)

2009.13423

Country:

Asia > East Asia (0.24)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
(15 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.90)
(2 more...)

Dewez, Florent, Guedj, Benjamin, Vandewalle, Vincent

From industry-wide parameters to aircraft-centric on-flight inference: improving aeronautics performance prediction with machine learning

arXiv.org Machine LearningSep-24-2020

Aircraft performance models play a key role in airline operations, especially in planning a fuel-efficient flight. In practice, manufacturers provide guidelines which are slightly modified throughout the aircraft life cycle via the tuning of a single factor, enabling better fuel predictions. However this has limitations, in particular they do not reflect the evolution of each feature impacting the aircraft performance. Our goal here is to overcome this limitation. The key contribution of the present article is to foster the use of machine learning to leverage the massive amounts of data continuously recorded during flights performed by an aircraft and provide models reflecting its actual and individual performance. We illustrate our approach by focusing on the estimation of the drag and lift coefficients from recorded flight data. As these coefficients are not directly recorded, we resort to aerodynamics approximations. As a safety check, we provide bounds to assess the accuracy of both the aerodynamics approximation and the statistical performance of our approach. We provide numerical results on a collection of machine learning algorithms. We report excellent accuracy on real-life data and exhibit empirical evidence to support our modelling, in coherence with aerodynamics principles.

artificial intelligence, coefficient, machine learning, (16 more...)

2005.05286

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.50)

Industry:

Transportation > Air (1.00)
Aerospace & Defense > Aircraft (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

#artificialintelligenceSep-22-2020, 23:05:22 GMT

neomatrix369/awesome-ai-ml-dl

Contributions are very welcome, please share back with the wider community (and get credited for it)! Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.

decision tree learning, java, machine learning, (1 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)