Goto

Collaborating Authors

 Accuracy


WWE Backlash 2017: Predictions, Match Card For 'SmackDown Live' PPV

International Business Times

For the first time in more than three months, a "SmackDown Live" pay-per-view is on the schedule. WWE Backlash 2017 is set for Sunday night in Rosemont, Illinois at the Allstate Arena with a few new faces set to compete in some of the card's biggest matches. Below are WWE Backlash predictions for every match on the card. Eight matches are scheduled, and three championships will be on the line. It was pretty surprising when Mahal became the No.1 contender for the top belt on "SmackDown Live," and it would be even more shocking to see him win the title.


How to create text classifiers with Machine Learning

@machinelearnbot

Building a quality machine learning model for text classification can be a challenging process. You need to build a training dataset, test different parameters for your model, fix the confusions, among other things. On this post, we will describe the process on how you can successfully train text classifiers with machine learning using MonkeyLearn. What are the categories or tags that you want to assign to your texts? This is the first question you need to answer when you start working on your text classifier.


On ROC Curve Analysis of Artificial Neural Network Classifiers

AAAI Conferences

Receiver operating characteristic or ROC curves are of great interest in evaluating many security systems such as biometric authentication. They visualize the trade-off between the number of security breaches and the level of convenience. In the earlier work, ROC curves and their decision boundaries were studied for various classifiers. Here, further studies are conducted to identify problems of ROC curve analysis when artificial neural network (ANN) classifiers' net values are used. Graphical decision boundaries and experimental results on the IRIS biometric authentication system reveal the over-fitting in the ROC curve analysis. This graphical decision boundaries suggest that ANN classifiers with two output units are more desirable than those with a single output unit for two class classification problems.


To tune or not to tune the number of trees in random forest?

arXiv.org Machine Learning

The number of trees T in the random forest (RF) algorithm for supervised learning has to be set by the user. It is controversial whether T should simply be set to the largest computationally manageable value or whether a smaller T may in some cases be better. While the principle underlying bagging is that "more trees are better", in practice the classification error rate sometimes reaches a minimum before increasing again for increasing number of trees. The goal of this paper is four-fold: (i) providing theoretical results showing that the expected error rate may be a non-monotonous function of the number of trees and explaining under which circumstances this happens; (ii) providing theoretical results showing that such non-monotonous patterns cannot be observed for other performance measures such as the Brier score and the logarithmic loss (for classification) and the mean squared error (for regression); (iii) illustrating the extent of the problem through an application to a large number (n = 306) of datasets from the public database OpenML; (iv) finally arguing in favor of setting it to a computationally feasible large number, depending on convergence properties of the desired performance measure.


The Best Metric to Measure Accuracy of Classification Models

@machinelearnbot

Unlike evaluating the accuracy of models that predict a continuous or discrete dependent variable like Linear Regression models, evaluating the accuracy of a classification model could be more complex and time-consuming. Before measuring the accuracy of classification models, an analyst would first measure its robustness with the help of metrics such as AIC-BIC, AUC-ROC, AUC- PR, Kolmogorov-Smirnov chart, etc. The next logical step is to measure its accuracy. To understand the complexity behind measuring the accuracy, we need to know few basic concepts. E.g. โ€“ A classification model like Logistic Regression will output a probability number between 0 and 1 instead of the desired output of actual target variable like Yes/No, etc.


Loan Prediction โ€“ Using PCA and Naive Bayes Classification with R

@machinelearnbot

Nowadays, there are numerous risks related to bank loans both for the banks and the borrowers getting the loans. The risk analysis about bank loans needs understanding about the risk and the risk level. Banks need to analyze their customers for loan eligibility so that they can specifically target those customers. Banks wanted to automate the loan eligibility process (real time) based on customer details such as Gender, Marital Status, Age, Occupation, Income, debts, and others provided in their online application form. As the number of transactions in banking sector is rapidly growing and huge data volumes are available, the customers' behavior can be easily analyzed and the risks around loan can be reduced.


Extending Defensive Distillation

arXiv.org Machine Learning

Deployed machine learning (ML) models are vulnerable to inputs maliciously perturbed to force them to mispredict [1, 2]. A class of such inputs, named adversarial examples, are systematically constructed through slight perturbations of otherwise correctly classified inputs [3, 4]. These perturbations are chosen to maximize the model's prediction error while leaving the semantics of the input unchanged. Although this often poses a non-tractable optimization problem for popular architectures like deep neural networks, heuristics allow the adversary to find effective perturbations--typically through the evaluation of gradients of the model's output with respect to its inputs [3, 5]. To defend against adversarial examples, two classes of approaches exist.


Boosting Factor-Specific Functional Historical Models for the Detection of Synchronisation in Bioelectrical Signals

arXiv.org Machine Learning

The link between different psychophysiological measures during emotion episodes is not well understood. To analyse the functional relationship between electroencephalography (EEG) and facial electromyography (EMG), we apply historical function-on-function regression models to EEG and EMG data that were simultaneously recorded from 24 participants while they were playing a computerised gambling task. Given the complexity of the data structure for this application, we extend simple functional historical models to models including random historical effects, factor-specific historical effects, and factor-specific random historical effects. Estimation is conducted by a component-wise gradient boosting algorithm, which scales well to large data sets and complex models.


Comparison of Decision Tree Based Classification Strategies to Detect External Chemical Stimuli from Raw and Filtered Plant Electrical Response

arXiv.org Machine Learning

Plants monitor their surrounding environment and control their physiological functions by producing an electrical response. We recorded electrical signals from different plants by exposing them to Sodium Chloride (NaCl), Ozone (O3) and Sulfuric Acid (H2SO4) under laboratory conditions. After applying pre-processing techniques such as filtering and drift removal, we extracted few statistical features from the acquired plant electrical signals. Using these features, combined with different classification algorithms, we used a decision tree based multi-class classification strategy to identify the three different external chemical stimuli. We here present our exploration to obtain the optimum set of ranked feature and classifier combination that can separate a particular chemical stimulus from the incoming stream of plant electrical signals. The paper also reports an exhaustive comparison of similar feature based classification using the filtered and the raw plant signals, containing the high frequency stochastic part and also the low frequency trends present in it, as two different cases for feature extraction. The work, presented in this paper opens up new possibilities for using plant electrical signals to monitor and detect other environmental stimuli apart from NaCl, O3 and H2SO4 in future.


Document Classification with scikit-learn

@machinelearnbot

Document classification is a fundamental machine learning task. It is used for all kinds of applications, like filtering spam, routing support request to the right support rep, language detection, genre classification, sentiment analysis, and many more. To demonstrate text classification with scikit-learn, we're going to build a simple spam filter. While the filters in production for services like Gmail are vastly more sophisticated, the model we'll have by the end of this tutorial is effective, and surprisingly accurate. Spam filtering is kind of like the "Hello world" of document classification. However, something to be aware of is that you aren't limited to two classes.