Collaborating Authors

decision tree learning

An Enhanced Secure Deep Learning Algorithm for Fraud Detection in Wireless Communication


In today’s era of technology, especially in the Internet commerce and banking, the transactions done by the Mastercards have been increasing rapidly. The card becomes the highly useable equipment for Internet shopping. Such demanding and inflation rate causes a considerable damage and enhancement in fraud cases also. It is very much necessary to stop the fraud transactions because it impacts on financial conditions over time the anomaly detection is having some important application to detect the fraud detection. A novel framework which integrates Spark with a deep learning approach is proposed in this work. This work also implements different machine learning techniques for detection of fraudulent like random forest, SVM, logistic regression, decision tree, and KNN. Comparative analysis is done by using various parameters. More than 96% accuracy was obtained for both training and testing datasets. The existing system like Cardwatch, web service-based fraud detection, needs labelled data for both genuine and fraudulent transactions. New frauds cannot be found in these existing techniques. The dataset which is used contains transaction made by credit cards in September 2013 by cardholders of Europe. The dataset contains the transactions occurred in 2 days, in which there are 492 fraud transactions out of 284,807 which is 0.172% of all transaction.

Data on Machine Learning Described by Researchers at University of New South Wales (Learning from machines to close the gap between funding and expenditure in the Australian National Disability Insurance Scheme): Machine Learning


By a News Reporter-Staff News Editor at Insurance Daily News -- New research on artificial intelligence is the subject of a new report. According to news reporting originating from Canberra, Australia, by NewsRx correspondents, research stated, "The Australian National Disability Insurance Scheme (NDIS) allocates funds to participants for purchase of services." Our news reporters obtained a quote from the research from University of New South Wales: "Only one percent of the 89,299 participants spent all of their allocated funds with 85 participants having failed to spend any, meaning that most of the participants were left with unspent funds. The gap between the allocated budget and realised expenditure reflects misallocation of funds. Thus we employ alternative machine learning techniques to estimate budget and close the gap while maintaining the aggregate level of spending. Three experiments are conducted to test the machine learning models in estimating the budget, expenditure and the resulting gap; compare the learning rate between machines and humans; and identify the significant explanatory variables."

A Complete Guide to Decision Trees


The Decision Tree is a machine learning algorithm that takes its name from its tree-like structure and is used to represent multiple decision stages and the possible response paths. The decision tree provides good results for classification tasks or regression analyses. With the help of the tree structure, an attempt is made not only to visualize the various decision levels but also to put them in a certain order. For individual data points, predictions can be made, for example, a classification by arriving at the target value along with the observations in the branches. The decision trees are used for classifications or regressions depending on the target variable.

Hyperparameter Tuning of Decision Tree Classifier Using GridSearchCV


The models can have many hyperparameters and finding the best combination of the parameter using grid search methods. Grid search is a technique for tuning hyperparameter that may facilitate build a model and evaluate a model for every combination of algorithms parameters per grid. We might use 10 fold cross-validation to search the best value for that tuning hyperparameter. These values are called hyperparameters. To get the simplest set of hyperparameters we will use the Grid Search method.

Introduction to Random Forest Algorithm


Random Forest is a supervised machine learning algorithm that is composed of individual decision trees. This type of model is called an ensemble model because an "ensemble" of independent models is used to compute a result. The basis for the Random Forest is formed by many individual decision trees, the so-called Decision Trees. A tree consists of different decision levels and branches, which are used to classify data. The Decision Tree algorithm tries to divide the training data into different classes so that the objects within a class are as similar as possible and the objects of different classes are as different as possible. This tree helps to decide whether to do sports outside or not, depending on the weather variables "weather", "humidity" and "wind force".

Discriminatory AI explained with an example


AI is increasingly used in making decisions that impact us directly such as job applications, our credit rating, match-making on dating sites. So it is important that AI is non-discriminatory and that decisions do not favor certain races, gender, the color of skin. Discriminatory AI is a very wide subject going beyond purely technical aspects. However, to make it easily understandable, I will demonstrate how discriminatory AI looks using examples and visuals. This will give you a way to spot a discriminatory AI. Let me first establish the context of the example.

Understanding your Neural Network's predictions


Neural networks are extremely convenient. They are usable for both regression and classification, work on structured and unstructured data, handle temporal data very well, and can usually reach high performances if they are given a sufficient amount of data. What is gained in convenience is, however, lost in interpretability and that can be a major setback when models are presented to a non-technical audience, such as clients or stakeholders. For instance, last year, the Data Science team I am part of wanted to convince a client to go from a decision tree model to a neural network, and for good reasons: we had access to a large amount of data and most of it was temporal. The client was on board, but wanted to keep an understanding of what the model based its decisions on, which means evaluating its features' importance.

The Application of Machine Learning Techniques for Predicting Match Results in Team Sport: A Review

Journal of Artificial Intelligence Research

Predicting the results of matches in sport is a challenging and interesting task. In this paper, we review a selection of studies from 1996 to 2019 that used machine learning for predicting match results in team sport. Considering both invasion sports and striking/fielding sports, we discuss commonly applied machine learning algorithms, as well as common approaches related to data and evaluation. Our study considers accuracies that have been achieved across different sports, and explores whether evidence exists to support the notion that outcomes of some sports may be inherently more difficult to predict. We also uncover common themes of future research directions and propose recommendations for future researchers. Although there remains a lack of benchmark datasets (apart from in soccer), and the differences between sports, datasets and features makes between-study comparisons difficult, as we discuss, it is possible to evaluate accuracy performance in other ways. Artificial Neural Networks were commonly applied in early studies, however, our findings suggest that a range of models should instead be compared. Selecting and engineering an appropriate feature set appears to be more important than having a large number of instances. For feature selection, we see potential for greater inter-disciplinary collaboration between sport performance analysis, a sub-discipline of sport science, and machine learning.

Decision Trees vs Random Forest


Last week I published two articles about Decision Trees: one about Decision and Classification Tree (CART) and another tutorial on how to implement Random Forest classifier. These two methods may look very similar, however there are important differences that every data professional or enthusiastic should know.