AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

Practical Guide to deal with Imbalanced Classification Problems in R

#artificialintelligenceMar-31-2016, 19:43:06 GMT

We have several machine learning algorithms at our disposal for model building. Doing data based prediction is now easier like never before. Whether it is a regression or classification problem, one can effortlessly achieve a reasonably high accuracy using a suitable algorithm. But, this is not the case everytime. Classification problems can sometimes get a bit tricky. ML algorithms tend to tremble when faced with imbalanced classification data sets. Moreover, they result in biased predictions and misleading accuracies. But, why does it happen? What factors deteriorate their performance?

accuracy, artificial intelligence, machine learning, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Sparse Representation of Multivariate Extremes with Applications to Anomaly Ranking

Goix, Nicolas, Sabourin, Anne, Clémençon, Stéphan

arXiv.org Machine LearningMar-31-2016

Extremes play a special role in Anomaly Detection. Beyond inference and simulation purposes, probabilistic tools borrowed from Extreme Value Theory (EVT), such as the angular measure, can also be used to design novel statistical learning methods for Anomaly Detection/ranking. This paper proposes a new algorithm based on multivariate EVT to learn how to rank observations in a high dimensional space with respect to their degree of 'abnormality'. The procedure relies on an original dimension-reduction technique in the extreme domain that possibly produces a sparse representation of multivariate extremes and allows to gain insight into the dependence structure thereof, escaping the curse of dimensionality. The representation output by the unsupervised methodology we propose here can be combined with any Anomaly Detection technique tailored to non-extreme data. As it performs linearly with the dimension and almost linearly in the data (in O(dn log n)), it fits to large scale problems. The approach in this paper is novel in that EVT has never been used in its multivariate version in the field of Anomaly Detection. Illustrative experimental results provide strong empirical evidence of the relevance of our approach.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1603.09584

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Some Machine Learning Concepts For Beginners

#artificialintelligenceMar-30-2016, 04:30:22 GMT

Let's start with some basic concepts, Machine learning as a general concept, is fairly simple and is similar to how humans learn. Machines teach themselves based on patterns that they "see" in data or images, giving them the ability to program themselves. The efficiency of machine learning is measured primarily in the variables of precision and recall. The easiest way to think of precision is with the AI you probably interact with most frequently: a search engine. Let's say that you do a Google search for "purple polka dotted cat bed" and that gets you 50 results, and of those results, only 25 are actually relevant (ie. 25 of those web pages have purple polka dotted cat beds).

artificial intelligence, purple polka dotted cat bed, search engine, (6 more...)

#artificialintelligence

Industry: Information Technology (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.39)

Add feedback

Interpretability of Multivariate Brain Maps in Brain Decoding: Definition and Quantification

Kia, Seyed Mostafa

arXiv.org Machine LearningMar-29-2016

Brain decoding is a popular multivariate approach for hypothesis testing in neuroimaging. It is well known that the brain maps derived from weights of linear classifiers are hard to interpret because of high correlations between predictors, low signal to noise ratios, and the high dimensionality of neuroimaging data. Therefore, improving the interpretability of brain decoding approaches is of primary interest in many neuroimaging studies. Despite extensive studies of this type, at present, there is no formal definition for interpretability of multivariate brain maps. As a consequence, there is no quantitative measure for evaluating the interpretability of different brain decoding methods. In this paper, first, we present a theoretical definition of interpretability in brain decoding; we show that the interpretability of multivariate brain maps can be decomposed into their reproducibility and representativeness. Second, as an application of the proposed theoretical definition, we formalize a heuristic method for approximating the interpretability of multivariate brain maps in a binary magnetoencephalography (MEG) decoding scenario. Third, we propose to combine the approximated interpretability and the performance of the brain decoding model into a new multi-objective criterion for model selection. Our results for the MEG data show that optimizing the hyper-parameters of the regularized linear classifier based on the proposed criterion results in more informative multivariate brain maps. More importantly, the presented definition provides the theoretical background for quantitative evaluation of interpretability, and hence, facilitates the development of more effective brain decoding algorithms in the future.

artificial intelligence, interpretability, machine learning, (19 more...)

arXiv.org Machine Learning

1603.08704

Country:

Europe (1.00)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Zero Shot Recognition with Unreliable Attributes

Jayaraman, Dinesh, Grauman, Kristen

arXiv.org Machine LearningMar-29-2016

In principle, zero-shot learning makes it possible to train a recognition model simply by specifying the category's attributes. For example, with classifiers for generic attributes like \emph{striped} and \emph{four-legged}, one can construct a classifier for the zebra category by enumerating which properties it possesses---even without providing zebra training images. In practice, however, the standard zero-shot paradigm suffers because attribute predictions in novel images are hard to get right. We propose a novel random forest approach to train zero-shot models that explicitly accounts for the unreliability of attribute predictions. By leveraging statistics about each attribute's error tendencies, our method obtains more robust discriminative models for the unseen classes. We further devise extensions to handle the few-shot scenario and unreliable attribute descriptions. On three datasets, we demonstrate the benefit for visual category learning with zero or few training examples, a critical domain for rare categories or categories defined on the fly.

large language model, machine learning, natural language, (22 more...)

arXiv.org Machine Learning

1409.4327

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm

Jain, Prateek, Jin, Chi, Kakade, Sham M., Netrapalli, Praneeth, Sidford, Aaron

arXiv.org Machine LearningMar-28-2016

This work provides improved guarantees for streaming principle component analysis (PCA). Given $A_1, \ldots, A_n\in \mathbb{R}^{d\times d}$ sampled independently from distributions satisfying $\mathbb{E}[A_i] = \Sigma$ for $\Sigma \succeq \mathbf{0}$, this work provides an $O(d)$-space linear-time single-pass streaming algorithm for estimating the top eigenvector of $\Sigma$. The algorithm nearly matches (and in certain cases improves upon) the accuracy obtained by the standard batch method that computes top eigenvector of the empirical covariance $\frac{1}{n} \sum_{i \in [n]} A_i$ as analyzed by the matrix Bernstein inequality. Moreover, to achieve constant accuracy, our algorithm improves upon the best previous known sample complexities of streaming algorithms by either a multiplicative factor of $O(d)$ or $1/\mathrm{gap}$ where $\mathrm{gap}$ is the relative distance between the top two eigenvalues of $\Sigma$. These results are achieved through a novel analysis of the classic Oja's algorithm, one of the oldest and most popular algorithms for streaming PCA. In particular, this work shows that simply picking a random initial point $w_0$ and applying the update rule $w_{i + 1} = w_i + \eta_i A_i w_i$ suffices to accurately estimate the top eigenvector, with a suitable choice of $\eta_i$. We believe our result sheds light on how to efficiently perform streaming PCA both in theory and in practice and we hope that our analysis may serve as the basis for analyzing many variants and extensions of streaming PCA.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1602.06929

Country: North America > Canada (0.67)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

Add feedback

"Did I Say Something Wrong?" A Word-Level Analysis of Wikipedia Articles for Deletion Discussions

Ruster, Michael

arXiv.org Machine LearningMar-25-2016

This thesis focuses on gaining linguistic insights into textual discussions on a word level. It was of special interest to distinguish messages that constructively contribute to a discussion from those that are detrimental to them. Thereby, we wanted to determine whether "I"- and "You"-messages are indicators for either of the two discussion styles. These messages are nowadays often used in guidelines for successful communication. Although their effects have been successfully evaluated multiple times, a large-scale analysis has never been conducted. Thus, we used Wikipedia Articles for Deletion (short: AfD) discussions together with the records of blocked users and developed a fully automated creation of an annotated data set. In this data set, messages were labelled either constructive or disruptive. We applied binary classifiers to the data to determine characteristic words for both discussion styles. Thereby, we also investigated whether function words like pronouns and conjunctions play an important role in distinguishing the two. We found that "You"-messages were a strong indicator for disruptive messages which matches their attributed effects on communication. However, we found "I"-messages to be indicative for disruptive messages as well which is contrary to their attributed effects. The importance of function words could neither be confirmed nor refuted. Other characteristic words for either communication style were not found. Yet, the results suggest that a different model might represent disruptive and constructive messages in textual discussions better.

classifier, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1603.08048

Country:

North America > United States (1.00)
Europe (0.92)

Genre: Research Report > New Finding (1.00)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.94)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(3 more...)

Add feedback

A Fistful of Bitcoins

Communications of the ACMMar-24-2016, 13:28:53 GMT

Bitcoin is a purely online virtual currency, unbacked by either physical commodities or sovereign obligation; instead, it relies on a combination of cryptographic protection and a peer-to-peer protocol for witnessing settlements. Consequently, Bitcoin has the unintuitive property that while the ownership of money is implicitly anonymous, its flow is globally visible. In this paper we explore this unique characteristic further, using heuristic clustering to group Bitcoin wallets based on evidence of shared authority, and then using re-identification attacks (i.e., empirical purchasing of goods and services) to classify the operators of those clusters. From this analysis, we consider the challenges for those seeking to use Bitcoin for criminal or fraudulent purposes at scale. Demand for low friction e-commerce of various kinds has driven a proliferation in online payment systems over the last decade. Thus, in addition to established payment card networks (e.g., Visa and Mastercard), a broad range of the so-called "alternative payments" has emerged including eWallets (e.g., Paypal, Google Checkout, and WebMoney), direct debit systems (typically via ACH, such as eBillMe), money transfer systems (e.g., Moneygram), and so on. However, virtually all of these systems have the property that they are denominated in existing fiat currencies (e.g., dollars), explicitly identify the payer in transactions, and are centrally or quasi-centrally administered. By far the most intriguing exception to this rule is Bitcoin. First deployed in 2009, Bitcoin is an independent online monetary system that combines some of the features of cash and existing online payment methods. Like cash, Bitcoin transactions do not explicitly identify the payer or the payee: a transaction is a cryptographically signed transfer of funds from one public key to another.

artificial intelligence, machine learning, transaction, (17 more...)

Communications of the ACM

Country:

North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > California > San Diego County > La Jolla (0.05)
North America > Canada (0.04)
(5 more...)

Industry:

Information Technology > Services > e-Commerce Services (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

11 Important Model Evaluation Techniques Everyone Should Know

@machinelearnbotMar-23-2016, 14:40:28 GMT

Model evaluation metrics are used to assess goodness of fit between model and data, to compare different models, in the context of model selection, and to predict how predictions (associated with a specific model and data set) are expected to be accurate. Confidence intervals are used to assess how reliable a statistical estimate is. Wide confidence intervals mean that your model is poor (and it is worth investigating other models), or that your data is very noisy if confidence intervals don't improve by changing the model (that is, testing a different theoretical statistical distribution for your observations.) Modern confidence intervals are model-free, data -driven: click here to see how to compute them. A more general framework to assess and reduce sources of variance is called analysis of variance.

artificial intelligence, machine learning, test data, (18 more...)

@machinelearnbot

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Quality and correctness of classification models. Part 3 – Confusion Matrix

@machinelearnbotMar-23-2016, 12:55:36 GMT

In the last part of the tutorial we introduced quantitative indicators of classification model quality. In the next two parts we will take a closer look at a couple of graphical indicators. The first one is called the Confusion Matrix (the name „Contingency Table" is also used).

artificial intelligence, confusion matrix, machine learning, (4 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback