AITopics

Industry: Banking & Finance > Trading (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.57)

#artificialintelligenceOct-18-2019, 00:53:51 GMT

SAS Tutorial Python Integration with SAS Viya

In this SAS How To Tutorial, Ari Zitin explores several examples of Python integration with SAS. There are many SAS Viya Cloud Analytic Services (CAS) that can be submitted from Python. In this Python integration demo, Ari focuses on predictive modeling. He shows how to connect to CAS, access in-memory data, bring data locally to use Pandas, and prepare data for predictive modeling. Ari then steps through how to build, score and assess a Decision Tree model.

predictive modeling, sas tutorial python integration, sas viya, (4 more...)

Industry: Education > Educational Setting (0.33)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Yang, Li, Moubayed, Abdallah, Hamieh, Ismail, Shami, Abdallah

Tree-based Intelligent Intrusion Detection System in Internet of Vehicles

arXiv.org Machine LearningOct-18-2019

Abstract--The use of autonomous vehicles (A Vs) is a promising technology in Intelligent Transportation Systems (ITSs) t o improve safety and driving efficiency. V ehicle-to-everythin g (V2X) technology enables communication among vehicles and other infrastructures. However, A Vs and Internet of V ehicles (Io V) are vulnerable to different types of cyber-attacks such as d enial of service, spoofing, and sniffing attacks. In this paper, an intelligent intrusion detection system (IDS) is proposed b ased on tree-structure machine learning models. The results fro m the implementation of the proposed intrusion detection system on standard data sets indicate that the system has the ability t o identify various cyber-attacks in the A V networks. Further more, the proposed ensemble learning and feature selection appro aches enable the proposed system to achieve high detection rate an d low computational cost simultaneously. With more vehicles, devices, and infrastructures involved, the conventional vehicular ad hoc networks (V ANETs) are gradually evolving into the Internet of V ehicles (IoV) [1].

accuracy, algorithm, xgboost, (14 more...)

1910.08635

Country:

Asia > China (0.14)
North America > Canada > Ontario > Middlesex County > London (0.04)
North America > United States (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)

#artificialintelligenceOct-17-2019, 11:19:22 GMT

uLektz Skills Latest Industry Required Skill Courses

Data Science is the study of the generalizable extraction of knowledge from data. This course serves as an introduction to the data science principles required to tackle data-rich problems in business and academia, including: Statistical Interference, Machine Learning, Machine Learning algorithms, Classification techniques, Decision Tree, Clustering, Recommender Engines, Text Mining & Time series. The Data Science course enables you to gain knowledge of the entire life cycle of Data Science, analyze and visualize different data sets, different Machine Learning Algorithms like K-Means Clustering, Decision Trees, Random Forest, and Naive Bayes.

data science, latest industry required skill course, ulektz skill, (2 more...)

Country: Asia > India > Chandigarh (0.40)

Genre: Instructional Material > Course Syllabus & Notes (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.75)

Saberian, Mohammad, Delgado, Pablo, Raimond, Yves

Gradient Boosted Decision Tree Neural Network

arXiv.org Machine LearningOct-17-2019

In this paper we propose a method to build a neural network that is similar to an ensemble of decision trees. We first illustrate how to convert a learned ensemble of decision trees to a single neural network with one hidden layer and an input transformation. We then relax some properties of this network such as thresholds and activation functions to train an approximately equivalent decision tree ensemble. The final model, Hammock, is surprisingly simple: a fully connected two layers neural network where the input is quantized and one-hot encoded. Experiments on large and small datasets show this simple method can achieve performance similar to that of Gradient Boosted Decision Trees.

decision tree, neural network, transformation, (13 more...)

1910.0934

Country: North America > United States > California > San Francisco County > San Francisco (0.15)

Genre: Research Report (0.51)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Zhang, Wenhao, Ramezani, Ramin, Naeim, Arash

WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning

arXiv.org Machine LearningOct-17-2019

Machine learning classifiers often stumble over imbalanced datasets where classes are not equally represented. This inherent bias towards the majority class may result in low accuracy in labeling minority class. Imbalanced learning is prevalent in many real world applications, such as medical research, network intrusion detection, and fraud detection in credit card transaction, etc. A good number of research works have been reported to tackle this challenging problem. For example, SMOTE (Synthetic Minority Over-sampling TEchnique) and ADASYN (ADAptive SYNthetic sampling approach) use oversampling techniques to balance the skewed datasets. In this paper, we propose a novel method which combines a Weighted Oversampling Technique and ensemble Boosting method to improve the classification accuracy of minority data without sacrificing the accuracy of majority class. WOTBoost adjust its oversampling strategy at each round of boosting to synthesize more targeted minority data samples. The adjustment is enforced using a weighted distribution. We compared WOTBoost with other 4 classification models (i.e. decision tree, SMOTE + decision tree, ADASYN + decision tree, SMOTEBoost) extensively on 18 public accessible imbalanced datasets. WOTBoost achieved the best G mean on 6 datasets and highest AUC score on 7 datasets.

algorithm, dataset, minority class, (14 more...)

1910.07892

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Wisconsin (0.04)
Asia (0.04)

Genre: Research Report (1.00)

Industry:

Banking & Finance > Credit (0.54)
Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.89)

#artificialintelligenceOct-15-2019, 17:53:53 GMT

Data Lake Machine Learning Models with Python and Dremio

Amazon Simple Storage Service (S3) is an object storage service that offers high availability and reliability, easy scaling, security, and performance. Many companies all around the world use Amazon S3 to store and protect their data. PostgreSQL is an open-source object-relational database system. In addition to many useful features, PostgreSQL is highly extensible, and this allows to organize work with the most complicated data workloads easily. In this article, we will show how to load data into Amazon S3 and PostgreSQL, then how to connect these sources to Dremio, and how to perform data curation.

classifier, dataset, dremio, (11 more...)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.33)

Anghel, Andreea, Ioannou, Nikolas, Parnell, Thomas, Papandreou, Nikolaos, Mendler-Dünner, Celestine, Pozidis, Haris

Breadth-first, Depth-next Training of Random Forests

arXiv.org Machine LearningOct-15-2019

In this paper we analyze, evaluate, and improve the performance of training Random Forest (RF) models on modern CPU architectures. An exact, state-of-the-art binary decision tree building algorithm is used as the basis of this study. Firstly, we investigate the trade-offs between using different tree building algorithms, namely breadth-first-search (BFS) and depth-search-first (DFS). We design a novel, dynamic, hybrid BFS-DFS algorithm and demonstrate that it performs better than both BFS and DFS, and is more robust in the presence of workloads with different characteristics. Secondly, we identify CPU performance bottlenecks when generating trees using this approach, and propose optimizations to alleviate them. The proposed hybrid tree building algorithm for RF is implemented in the Snap Machine Learning framework, and speeds up the training of RFs by 7.8x on average when compared to state-of-the-art RF solvers (sklearn, H2O, and xgboost) on a range of datasets, RF configurations, and multi-core CPU architectures.

algorithm, matrix, node, (15 more...)

1910.06853

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.05)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

arXiv.org Machine LearningOct-14-2019

A note on the consistency of the random forest algorithm

Ferreira, José A.

Nowadays, the algorithm is acknowledged to be easy to use and to perform very well in general, even in problems involving many predictor variables (see for instance Biau and Scornet (2016) or the introduction to Scornet, Biau and Vert (2015)) ― so well, indeed, that several authors have posed and studied the question of their consistency (see Scornet, Biau and Vert (2015) and the earlier references provided by them). Consistent nonparametric statistical predictors have been known for a long time (e.g. Nadaraya (1964), Watson (1964), Stone (1977), Devroye and Wagner (1980)), but they converge very slowly and their computer implementations tend to be slow, especially when they involve many variables. In view of their comparative accuracy and high speed of implementation, random forests would become even more attractive if they were shown to be consistent under general data ‐ generating mechanisms. Besides, consistency is almost indispensable in applications of statistical prediction to the estimation of'causal effects' based on observational data (e.g.

artificial intelligence, machine learning, nullnull null, (17 more...)

1910.00943

Country:

North America > United States > New York (0.04)
Europe > Netherlands (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

#artificialintelligenceOct-13-2019, 17:43:35 GMT

What is Data Science?

Data Science is considered as one of the most modern and fascinating jobs of our time. It can be funny and can give you satisfaction, but is it really as it's described? At the beginning of their career, Data Scientists think that Data Science is a wonderful, magical world full of algorithms, Python functions that performs every possible spell with a line of code and statistical models able to detect the most useful correlations among data that could make you an invincible superhero in your company. You start dreaming about your CEO congratulating with you and shaking your hand, you begin to see decision trees and clusters everywhere and, of course, the most terrifying neural network architectures your mind can dream. But since the very first day of your first Data Science project, you start to realize what reality is.

algorithm, data science, never forget, (11 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.36)