AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

An in-depth guide to supervised machine learning classification

#artificialintelligenceDec-16-2019, 09:38:19 GMT

In supervised learning, algorithms learn from labeled data. After understanding the data, the algorithm determines which label should be given to new data by associating patterns to the unlabeled new data. Supervised learning can be divided into two categories: classification and regression. Some examples of classification include spam detection, churn prediction, sentiment analysis, dog breed detection and so on. Some examples of regression include house price prediction, stock price prediction, height-weight prediction and so on.

classification, classifier, prediction, (17 more...)

#artificialintelligence

Industry: Banking & Finance (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)

Add feedback

Evaluating Usage of Images for App Classification

Singla, Kushal, Mukherjee, Niloy, Koduvely, Hari Manassery, Bose, Joy

arXiv.org Machine LearningDec-16-2019

App classification is useful in a number of applications such as adding apps to an app store or building a user model based on the installed apps. Presently there are a number of existing methods to classify apps based on a given taxonomy on the basis of their text metadata. However, text based methods for app classification may not work in all cases, such as when the text descriptions are in a different language, or missing, or inadequate to classify the app. One solution in such cases is to utilize the app images to supplement the text description. In this paper, we evaluate a number of approaches in which app images can be used to classify the apps. In one approach, we use Optical character recognition (OCR) to extract text from images, which is then used to supplement the text description of the app. In another, we use pic2vec to convert the app images into vectors, then train an SVM to classify the vectors to the correct app label. In another, we use the captionbot.ai tool to generate natural language descriptions from the app images. Finally, we use a method to detect and label objects in the app images and use a voting technique to determine the category of the app based on all the images. We compare the performance of our image-based techniques to classify a number of apps in our dataset. We use a text based SVM app classifier as our base and obtained an improved classification accuracy of 96% for some classes when app images are added.

app, app image, classification, (14 more...)

arXiv.org Machine Learning

1912.12144

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Geneva > Geneva (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.35)

Add feedback

Predicting the Outcome of Judicial Decisions made by the European Court of Human Rights

O'Sullivan, Conor, Beel, Joeran

arXiv.org Machine LearningDec-16-2019

In this study, machine learning models were constructed to predict whether judgments made by the European Court of Human Rights (ECHR) would lead to a violation of an Article in the Convention on Human Rights. The problem is framed as a binary classification task where a judgment can lead to a "violation" or "non-violation" of a particular Article. Using auto-sklearn, an automated algorithm selection package, models were constructed for 12 Articles in the Convention. To train these models, textual features were obtained from the ECHR Judgment documents using N-grams, word embeddings and paragraph embeddings. Additional documents, from the ECHR, were incorporated into the models through the creation of a word embedding (echr2vec) and a doc2vec model. The features obtained using the echr2vec embedding provided the highest cross-validation accuracy for 5 of the Articles. The overall test accuracy, across the 12 Articles, was 68.83%. As far as we could tell, this is the first estimate of the accuracy of such machine learning models using a realistic test set. This provides an important benchmark for future work. As a baseline, a simple heuristic of always predicting the most common outcome in the past was used. The heuristic achieved an overall test accuracy of 86.68% which is 29.7% higher than the models. Again, this was seemingly the first study that included such a heuristic with which to compare model results. The higher accuracy achieved by the heuristic highlights the importance of including such a baseline.

accuracy, judgment, violation, (15 more...)

arXiv.org Machine Learning

1912.10819

Country:

North America > United States > New York (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Law > Civil Rights & Constitutional Law (0.83)
Law > International Law (0.71)
Government > Intergovernmental Programs (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)

Add feedback

Multi-stream Data Analytics for Enhanced Performance Prediction in Fantasy Football

Bonello, Nicholas, Beel, Joeran, Lawless, Seamus, Debattista, Jeremy

arXiv.org Machine LearningDec-16-2019

Fantasy Premier League (FPL) performance predictors tend to base their algorithms purely on historical statistical data. The main problems with this approach is that external factors such as injuries, managerial decisions and other tournament match statistics can never be factored into the final predictions. In this paper, we present a new method for predicting future player performances by automatically incorporating human feedback into our model. Through statistical data analysis such as previous performances, upcoming fixture difficulty ratings, betting market analysis, opinions of the general-public and experts alike via social media and web articles, we can improve our understanding of who is likely to perform well in upcoming matches. When tested on the English Premier League 2018/19 season, the model outperformed regular statistical predictors by over 300 points, an average of 11 points per week, ranking within the top 0.5% of players rank 30,000 out of over 6.5 million players.

different data source, gameweek, prediction, (11 more...)

arXiv.org Machine Learning

1912.07441

Country: Europe > Ireland > Leinster > County Dublin > Dublin (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)

Add feedback

Fairness Assessment for Artificial Intelligence in Financial Industry

Zhang, Yukun, Zhou, Longsheng

arXiv.org Machine LearningDec-16-2019

Artificial Intelligence (AI) is an important driving force for the development and transformation of the financial industry. However, with the fast-evolving AI technology and application, unintentional bias, insufficient model validation, immature contingency plan and other underestimated threats may expose the company to operational and reputational risks. In this paper, we focus on fairness evaluation, one of the key components of AI Governance, through a quantitative lens. Statistical methods are reviewed for imbalanced data treatment and bias mitigation. These methods and fairness evaluation metrics are then applied to a credit card default payment example.

algorithm, imbalanced data, probability, (15 more...)

arXiv.org Machine Learning

1912.07211

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Africa (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Banking & Finance > Financial Services (0.46)
Banking & Finance > Credit (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Na\"iveRole: Author-Contribution Extraction and Parsing from Biomedical Manuscripts

Tkaczyk, Dominika, Collins, Andrew, Beel, Joeran

arXiv.org Machine LearningDec-15-2019

Information about the contributions of individual authors to scientific publications is important for assessing authors' achievements. Some biomedical publications have a short section that describes authors' roles and contributions. It is usually written in natural language and hence author contributions cannot be trivially extracted in a machine-readable format. In this paper, we present 1) A statistical analysis of roles in author contributions sections, and 2) Na\"iveRole, a novel approach to extract structured authors' roles from author contribution sections. For the first part, we used co-clustering techniques, as well as Open Information Extraction, to semi-automatically discover the popular roles within a corpus of 2,000 contributions sections from PubMed Central. The discovered roles were used to automatically build a training set for Na\"iveRole, our role extractor approach, based on Na\"ive Bayes. Na\"iveRole extracts roles with a micro-averaged precision of 0.68, recall of 0.48 and F1 of 0.57. It is, to the best of our knowledge, the first attempt to automatically extract author roles from research papers. This paper is an extended version of a previous poster published at JCDL 2018.

corpus, manuscript, role mention, (15 more...)

arXiv.org Machine Learning

1912.1017

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Diagnosis of liver disease using computer-assisted imaging techniques: A Review

Kalejahi, Behnam Kiani, Meshgini, Saeed, Daneshvar, Sabalan, Asadzadeh, Shiva

arXiv.org Machine LearningDec-15-2019

The evidence says that liver disease detection using CAD is one of the most efficient techniques but the presence of better organization of studies and the performance parameters to represent the result analysis of the proposed techniques are pointedly missing in most of the recent studies. Few benchmarked studies have been found in some of the papers as benchmarking makes a reader understand that under which circumstances their experimental results or outcomes are better and useful for the future implementation and adoption of the work. Liver diseases and image processing algorithms, especially in medicine, are the most important and important topics of the day. Unfortunately, the necessary data and data, as they are invoked in the articles, are low in this area and require the revision and implementation of policies in order to gather and do more research in this field. Detection with ultrasound is quite normal in liver diseases and depends on the physician's experience and skills. CAD systems are very important for doctors to understand medical images and improve the accuracy of diagnosing various diseases. In the following, we describe the techniques used in the various stages of a CAD system, namely: extracting features, selecting features, and classifying them. Although there are many techniques that are used to classify medical images, it is still a challenging issue for creating a universally accepted approach.

classification, liver disease, ultrasound image, (16 more...)

arXiv.org Machine Learning

1912.09572

Country:

Europe (0.14)
Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.05)
North America > United States > New Jersey (0.04)
Asia > Azerbaijan > Baku Economic Region > Baku (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Therapeutic Area > Hepatology (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.67)

Add feedback

A novel spike-and-wave automatic detection in EEG signals

Quintero-Rincón, Antonio, Muro, Valeria, D'Giano, Carlos, Prendes, Jorge, Batatia, Hadj

arXiv.org Machine LearningDec-15-2019

Spike-and-wave discharge (SWD) pattern classification in electroencephalography (EEG) signals is a key problem in signal processing. It is particularly important to develop a SWD automatic detection method in long-term EEG recordings since the task of marking the patters manually is time consuming, difficult and error-prone. This paper presents a new detection method with a low computational complexity that can be easily trained if standard medical protocols are respected. The detection procedure is as follows: First, each EEG signal is divided into several time segments and for each time segment, the Morlet 1-D decomposition is applied. Then three parameters are extracted from the wavelet coefficients of each segment: scale (using a generalized Gaussian statistical model), variance and median. This is followed by a k-nearest neighbors (k-NN) classifier to detect the spike-and-wave pattern in each EEG channel from these three parameters. A total of 106 spike-and-wave and 106 non-spike-and-wave were used for training, while 69 new annotated EEG segments from six subjects were used for classification. In these circumstances, the proposed methodology achieved 100% accuracy. These results generate new research opportunities for the underlying causes of the so-called absence epilepsy in long-term EEG recordings.

classification, detection, eeg signal, (15 more...)

arXiv.org Machine Learning

1912.07123

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.92)
Health & Medicine > Therapeutic Area > Genetic Disease (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Breast Cancer Diagnosis by Higher-Order Probabilistic Perceptrons

Cowsik, Aditya, Clark, John W.

arXiv.org Machine LearningDec-14-2019

A two-layer neural network model that systematically includes correlations among input variables to arbitrary order and is designed to implement Bayes inference has been adapted to classify breast cancer tumors as malignant or benign, assigning a probability for either outcome. The inputs to the network represent measured characteristics of cell nuclei imaged in Fine Needle Aspiration biopsies. The present machine-learning approach to diagnosis (known as HOPP, for higher-order probabilistic perceptron) is tested on the much-studied, open-access Breast Cancer Wisconsin (Diagnosis) Data Set of Wolberg et al. This set lists, for each tumor, measured physical parameters of the cell nuclei of each sample. The HOPP model can identify the key factors -- input features and their combinations -- most relevant for reliable diagnosis. HOPP networks were trained on 90\% of the examples in the Wisconsin database, and tested on the remaining 10\%. Referred to ensembles of 300 networks, selected randomly for cross-validation, accuracy of classification for the test sets of up to 97\% was readily achieved, with standard deviation around 2\%, together with average Matthews correlation coefficients reaching 0.94 indicating excellent predictive performance. Demonstrably, the HOPP is capable of matching the predictive power attained by other advanced machine-learning algorithms applied to this much-studied database, over several decades. Analysis shows that in this special problem, which is almost linearly separable, the effects of irreducible correlations among the measured features of the Wisconsin database are of relatively minor importance, as the Naive Bayes approximation can itself yield predictive accuracy approaching 95\%. The advantages of the HOPP algorithm will be more clearly revealed in application to more challenging machine-learning problems.

diagnosis, neural network, probability, (16 more...)

arXiv.org Machine Learning

1912.06969

Country:

North America > United States > Wisconsin (0.65)
Europe > Portugal > Madeira > Funchal (0.04)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

MM Algorithms for Distance Covariance based Sufficient Dimension Reduction and Sufficient Variable Selection

Wu, Runxiong, Chen, Xin

arXiv.org Machine LearningDec-13-2019

Sufficient dimension reduction (SDR) using distance covariance (DCOV) was recently proposed as an approach to dimension-reduction problems. Compared with other SDR methods, it is model-free without estimating link function and does not require any particular distributions on predictors (see Sheng and Yin, 2013, 2016). However, the DCOV-based SDR method involves optimizing a nonsmooth and nonconvex objective function over the Stiefel manifold. To tackle the numerical challenge, we novelly formulate the original objective function equivalently into a DC (Difference of Convex functions) program and construct an iterative algorithm based on the majorization-minimization (MM) principle. At each step of the MM algorithm, we inexactly solve the quadratic subproblem on the Stiefel manifold by taking one iteration of Riemannian Newton's method. The algorithm can also be readily extended to sufficient variable selection (SVS) using distance covariance. We establish the convergence property of the proposed algorithm under some regularity conditions. Simulation studies show our algorithm drastically improves the computation efficiency and is robust across various settings compared with the existing method. Supplemental materials for this article are available.

algorithm, equation, log null 1, (14 more...)

arXiv.org Machine Learning

1912.06342

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback