AITopics

Genre: Instructional Material (0.63)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceDec-6-2019, 03:50:18 GMT

The Simple Math behind 3 Decision Tree Splitting criterions

Decision Trees are great and are useful for a variety of tasks. They form the backbone of most of the best performing models in the industry like XGboost and Lightgbm. But how do they work exactly? In fact, this is one of the most asked questions in ML/DS interviews. We generally know they work in a stepwise manner and have a tree structure where we split a node using some feature on some criterion.

entropy, impurity, node, (12 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.64)

#artificialintelligenceDec-5-2019, 14:52:40 GMT

r/MachineLearning - [D] Efficient Partial Dependence Plots with decision trees

Partial Dependence Plots (PDPs) are a standard model inspection technique. It turns out that for decision trees, they can be computed very efficiently. This post explains how PDPs are computed in general, and goes into the details of the optimized version for tree models.

decision tree, efficient partial dependence plot, partial dependence plot, (1 more...)

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.80)

Ramosaj, Burim, Pauly, Markus

Asymptotic Unbiasedness of the Permutation Importance Measure in Random Forest Models

arXiv.org Machine LearningDec-5-2019

Variable selection in sparse regression models is an important task as applications ranging from biomedical research to econometrics have shown. Especially for higher dimensional regression problems, for which the link function between response and covariates cannot be directly detected, the selection of informative variables is challenging. Under these circumstances, the Random Forest method is a helpful tool to predict new outcomes while delivering measures for variable selection. One common approach is the usage of the permutation importance. Due to its intuitive idea and flexible usage, it is important to explore circumstances, for which the permutation importance based on Random Forest correctly indicates informative covariates. Regarding the latter, we deliver theoretical guarantees for the validity of the permutation importance measure under specific assumptions and prove its (asymptotic) unbiasedness. An extensive simulation study verifies our findings.

permutation importance, sample size, signal-to-noise ratio, (16 more...)

1912.03306

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

arXiv.org Machine LearningDec-4-2019

RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number Clipping

Ren, Liliang, Sun, Gen, Wu, Jiaman

Natural gradient has been recently introduced to the field of boosting to enable the generic probabilistic predication capability. Natural gradient boosting shows promising performance improvements on small datasets due to better training dynamics, but it suffers from slow training speed overhead especially for large datasets. W e present a replication study of NGBoost ( Duan et al., 2019) training that carefully examines the impacts of key hyper-parameters under the circumstance of best-first decision tree learning. W e find that with the regularization of leaf number clipping, the performance of NGBoost can be largely improved via a better choice of hyperparameters. Experiments show that our approach significantly beats the state-of-the-art performance on various kinds of datasets from the UCI Machine Learning Repository while still has up to 4.85x speed up compared with the original approach of NGBoost.

dataset, gradient, ngboost, (13 more...)

1912.02338

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)

Drews, Samuel, Albarghouthi, Aws, D'Antoni, Loris

Proving Data-Poisoning Robustness in Decision Trees

arXiv.org Artificial IntelligenceDec-2-2019

Machine learning models are brittle, and small changes in the training data can result in different predictions. We study the problem of proving that a prediction is robust to data poisoning, where an attacker can inject a number of malicious elements into the training set to influence the learned model. We target decision-tree models, a popular and simple class of machine learning models that underlies many complex learning techniques. We present a sound verification technique based on abstract interpretation and implement it in a tool called Antidote. Antidote abstractly trains decision trees for an intractably large space of possible poisoned datasets. Due to the soundness of our abstraction, Antidote can produce proofs that, for a given input, the corresponding prediction would not have changed had the training set been tampered with or not. We demonstrate the effectiveness of Antidote on a number of popular datasets.

dataset, predicate, robustness, (17 more...)

arXiv.org Artificial Intelligence

1912.00981

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceNov-30-2019, 07:28:10 GMT

How Spotify know a lot about you using machine learning and AI.

Spotify is one of the best music streaming industry in the market. But what excites us the most is the amazing ways it uses to enhance the user experience. We all would be familiar with "discover weekly" which is a personalized playlist unique to each user. It is using artificial intelligence and machine learning algorithms to generates the playlist. It learns through your music preferences, streaming history or how many times you listened to a particular song.

music, playlist, spotify, (14 more...)

Country: Asia > India (0.05)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.30)

Provoost, Jesper, Wismans, Luc, Van der Drift, Sander, Kamilaris, Andreas, Van Keulen, Maurice

Short Term Prediction of Parking Area states Using Real Time Data and Machine Learning Techniques

arXiv.org Machine LearningNov-29-2019

Public road authorities and private mobility service providers need information derived from the current and predicted traffic states to act upon the daily urban system and its spatial and temporal dynamics. In this research, a real-time parking area state (occupancy, in- and outflux) prediction model (up to 60 minutes ahead) has been developed using publicly available historic and real time data sources. Based on a case study in a real-life scenario in the city of Arnhem, a Neural Network-based approach outperforms a Random Forest-based one on all assessed performance measures, although the differences are small. Both are outperforming a naive seasonal random walk model. Although the performance degrades with increasing prediction horizon, the model shows a performance gain of over 150% at a prediction horizon of 60 minutes compared with the naive model. Furthermore, it is shown that predicting the in- and outflux is a far more difficult task (i.e. performance gains of 30%) which needs more training data, not based exclusively on occupancy rate. However, the performance of predicting in- and outflux is less sensitive to the prediction horizon. In addition, it is shown that real-time information of current occupancy rate is the independent variable with the highest contribution to the performance, although time, traffic flow and weather variables also deliver a significant contribution. During real-time deployment, the model performs three times better than the naive model on average. As a result, it can provide valuable information for proactive traffic management as well as mobility service providers.

occupancy rate, prediction, provoost, (15 more...)

1911.13178

Country:

Europe > Netherlands > Gelderland > Arnhem (0.25)
Europe > Germany (0.04)
Africa > Nigeria > Plateau State (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Masoud, Sara, Chowdhury, Bijoy, Son, Young-Jun, Kubota, Chieri, Tronstad, Russell

A Dynamic Modelling Framework for Human Hand Gesture Task Recognition

arXiv.org Machine LearningNov-28-2019

Gesture recognition and hand motion tracking are important tasks in advanced gesture based interaction systems. In this paper, we propose to apply a sliding windows filtering approach to sample the incoming streams of data from data gloves and a decision tree model to recognize the gestures in real time for a manual grafting operation of a vegetable seedling propagation facility. The sequence of these recognized gestures defines the tasks that are taking place, which helps to evaluate individuals' performances and to identify any bottlenecks in real time. In this work, two pairs of data gloves are utilized, which reports the location of the fingers, hands, and wrists wirelessly (i.e., via Bluetooth). To evaluate the performance of the proposed framework, a preliminary experiment was conducted in multiple lab settings of tomato grafting operations, where multiple subjects wear the data gloves while performing different tasks. Our results show an accuracy of 91% on average, in terms of gesture recognition in real time by employing our proposed framework.

data glove, recognition, sensor, (15 more...)

1911.03923

Country:

North America > United States > New York > New York County > New York City (0.05)
South America > Chile > Valparaíso Region > Valparaíso Province > Valparaíso (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(6 more...)

Genre: Research Report > New Finding (0.69)

Industry:

Food & Agriculture > Agriculture (0.70)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.96)

#artificialintelligenceNov-27-2019, 18:05:01 GMT

Understanding Decision Trees In Machine Learning and How To Implement It In Python Using sklearn

Decision Trees are a type of supervised learning used for classification (yes/no) and regression (continuous data) where the data is continuously split according to a certain parameter. The predicted class is derived from features of the data. The following article creates a Decision Tree from the 311 on 3.11 Project. In this project, the resolution outcome being positive or negative is what is being predicted. Agency: NYPD, Dept of Transportation, Dept of Health & Mental Hygiene, Dept of Sanitation, Dept of Housing Preservation and Development, Dept of Parks and Recreation, etc Borough: Brooklyn, Queens, Manhattan, Bronx, Staten Island Location: Longitude/Latitude, Cross Streets, Intersections Created/Closed Date Complaint Type: Heat/Hot Water, Rodent, Noise, Street Condition, Illegal Parking, Unsanitary Condition, Blocked Driveway are just a few examples.

decision tree, dept, machine learning, (5 more...)

Country: North America > United States > New York > Richmond County > New York City (0.26)

Industry: Transportation (0.97)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.98)