AITopics

@machinelearnbotMay-16-2017, 11:30:10 GMT

How to create text classifiers with Machine Learning

Building a quality machine learning model for text classification can be a challenging process. You need to build a training dataset, test different parameters for your model, fix the confusions, among other things. On this post, we will describe the process on how you can successfully train text classifiers with machine learning using MonkeyLearn. What are the categories or tags that you want to assign to your texts? This is the first question you need to answer when you start working on your text classifier.

artificial intelligence, category, machine learning, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

AAAI ConferencesMay-16-2017

On ROC Curve Analysis of Artificial Neural Network Classifiers

Kim, Chulwoo (Pace University) | Cha, Sung-Hyuk (Computer Science Department Pace University) | An, Yoo Jung (Essex County College) | Wilson, Ned (Essex County College)

Receiver operating characteristic or ROC curves are of great interest in evaluating many security systems such as biometric authentication. They visualize the trade-off between the number of security breaches and the level of convenience. In the earlier work, ROC curves and their decision boundaries were studied for various classifiers. Here, further studies are conducted to identify problems of ROC curve analysis when artificial neural network (ANN) classifiers' net values are used. Graphical decision boundaries and experimental results on the IRIS biometric authentication system reveal the over-fitting in the ROC curve analysis. This graphical decision boundaries suggest that ANN classifiers with two output units are more desirable than those with a single output unit for two class classification problems.

artificial neural network classifier, roc curve analysis

AAAI Conferences

The Thirtieth International Flairs Conference

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Probst, Philipp, Boulesteix, Anne-Laure

To tune or not to tune the number of trees in random forest?

arXiv.org Machine LearningMay-16-2017

The number of trees T in the random forest (RF) algorithm for supervised learning has to be set by the user. It is controversial whether T should simply be set to the largest computationally manageable value or whether a smaller T may in some cases be better. While the principle underlying bagging is that "more trees are better", in practice the classification error rate sometimes reaches a minimum before increasing again for increasing number of trees. The goal of this paper is four-fold: (i) providing theoretical results showing that the expected error rate may be a non-monotonous function of the number of trees and explaining under which circumstances this happens; (ii) providing theoretical results showing that such non-monotonous patterns cannot be observed for other performance measures such as the Brier score and the logarithmic loss (for classification) and the mean squared error (for regression); (iii) illustrating the extent of the problem through an application to a large number (n = 306) of datasets from the public database OpenML; (iv) finally arguing in favor of setting it to a computationally feasible large number, depending on convergence properties of the desired performance measure.

artificial intelligence, decision tree learning, machine learning, (19 more...)

1705.05654

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.61)

@machinelearnbotMay-15-2017, 15:30:09 GMT

The Best Metric to Measure Accuracy of Classification Models

Unlike evaluating the accuracy of models that predict a continuous or discrete dependent variable like Linear Regression models, evaluating the accuracy of a classification model could be more complex and time-consuming. Before measuring the accuracy of classification models, an analyst would first measure its robustness with the help of metrics such as AIC-BIC, AUC-ROC, AUC- PR, Kolmogorov-Smirnov chart, etc. The next logical step is to measure its accuracy. To understand the complexity behind measuring the accuracy, we need to know few basic concepts. E.g. – A classification model like Logistic Regression will output a probability number between 0 and 1 instead of the desired output of actual target variable like Yes/No, etc.

artificial intelligence, classification model, machine learning, (15 more...)

Genre: Research Report > Experimental Study (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.91)

@machinelearnbotMay-15-2017, 01:45:05 GMT

Loan Prediction – Using PCA and Naive Bayes Classification with R

Nowadays, there are numerous risks related to bank loans both for the banks and the borrowers getting the loans. The risk analysis about bank loans needs understanding about the risk and the risk level. Banks need to analyze their customers for loan eligibility so that they can specifically target those customers. Banks wanted to automate the loan eligibility process (real time) based on customer details such as Gender, Marital Status, Age, Occupation, Income, debts, and others provided in their online application form. As the number of transactions in banking sector is rapidly growing and huge data volumes are available, the customers' behavior can be easily analyzed and the risks around loan can be reduced.

artificial intelligence, machine learning, naïve bayes classification, (11 more...)

Industry: Banking & Finance > Loans (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Papernot, Nicolas, McDaniel, Patrick

Extending Defensive Distillation

arXiv.org Machine LearningMay-15-2017

Deployed machine learning (ML) models are vulnerable to inputs maliciously perturbed to force them to mispredict [1, 2]. A class of such inputs, named adversarial examples, are systematically constructed through slight perturbations of otherwise correctly classified inputs [3, 4]. These perturbations are chosen to maximize the model's prediction error while leaving the semantics of the input unchanged. Although this often poses a non-tractable optimization problem for popular architectures like deep neural networks, heuristics allow the adversary to find effective perturbations--typically through the evaluation of gradients of the model's output with respect to its inputs [3, 5]. To defend against adversarial examples, two classes of approaches exist.

adversarial example, artificial intelligence, machine learning, (16 more...)

1705.05264

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Rügamer, David, Brockhaus, Sarah, Gentsch, Kornelia, Scherer, Klaus, Greven, Sonja

Boosting Factor-Specific Functional Historical Models for the Detection of Synchronisation in Bioelectrical Signals

arXiv.org Machine LearningMay-13-2017

The link between different psychophysiological measures during emotion episodes is not well understood. To analyse the functional relationship between electroencephalography (EEG) and facial electromyography (EMG), we apply historical function-on-function regression models to EEG and EMG data that were simultaneously recorded from 24 participants while they were playing a computerised gambling task. Given the complexity of the data structure for this application, we extend simple functional historical models to models including random historical effects, factor-specific historical effects, and factor-specific random historical effects. Estimation is conducted by a component-wise gradient boosting algorithm, which scales well to large data sets and complex models.

artificial intelligence, historical effect, machine learning, (17 more...)

1609.0607

Country:

North America > United States (0.46)
Europe > Germany (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Chatterjee, Shre Kumar, Das, Saptarshi, Maharatna, Koushik, Masi, Elisa, Santopolo, Luisa, Colzi, Ilaria, Mancuso, Stefano, Vitaletti, Andrea

Comparison of Decision Tree Based Classification Strategies to Detect External Chemical Stimuli from Raw and Filtered Plant Electrical Response

arXiv.org Machine LearningMay-13-2017

Plants monitor their surrounding environment and control their physiological functions by producing an electrical response. We recorded electrical signals from different plants by exposing them to Sodium Chloride (NaCl), Ozone (O3) and Sulfuric Acid (H2SO4) under laboratory conditions. After applying pre-processing techniques such as filtering and drift removal, we extracted few statistical features from the acquired plant electrical signals. Using these features, combined with different classification algorithms, we used a decision tree based multi-class classification strategy to identify the three different external chemical stimuli. We here present our exploration to obtain the optimum set of ranked feature and classifier combination that can separate a particular chemical stimulus from the incoming stream of plant electrical signals. The paper also reports an exhaustive comparison of similar feature based classification using the filtered and the raw plant signals, containing the high frequency stochastic part and also the low frequency trends present in it, as two different cases for feature extraction. The work, presented in this paper opens up new possibilities for using plant electrical signals to monitor and detect other environmental stimuli apart from NaCl, O3 and H2SO4 in future.

artificial intelligence, classifier, machine learning, (19 more...)

doi: 10.1016/j.snb.2017.04.071

1707.0762

Country: Europe (0.46)

Genre: Research Report > New Finding (0.92)

Industry:

Health & Medicine > Diagnostic Medicine (0.93)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.62)

@machinelearnbotMay-9-2017, 18:05:06 GMT

Document Classification with scikit-learn

Document classification is a fundamental machine learning task. It is used for all kinds of applications, like filtering spam, routing support request to the right support rep, language detection, genre classification, sentiment analysis, and many more. To demonstrate text classification with scikit-learn, we're going to build a simple spam filter. While the filters in production for services like Gmail are vastly more sophisticated, the model we'll have by the end of this tutorial is effective, and surprisingly accurate. Spam filtering is kind of like the "Hello world" of document classification. However, something to be aware of is that you aren't limited to two classes.

classifier, machine learning, natural language, (18 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)