AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Learning From Noisy Labels By Regularized Estimation Of Annotator Confusion

Tanno, Ryutaro, Saeedi, Ardavan, Sankaranarayanan, Swami, Alexander, Daniel C., Silberman, Nathan

arXiv.org Machine LearningFeb-10-2019

The predictive performance of supervised learning algorithms depends on the quality of labels. In a typical label collection process, multiple annotators provide subjective noisy estimates of the "truth" under the influence of their varying skill-levels and biases. Blindly treating these noisy labels as the ground truth limits the accuracy of learning algorithms in the presence of strong disagreement. This problem is critical for applications in domains such as medical imaging where both the annotation cost and inter-observer variability are high. In this work, we present a method for simultaneously learning the individual annotator model and the underlying true label distribution, using only noisy observations. Each annotator is modeled by a confusion matrix that is jointly estimated along with the classifier predictions. We propose to add a regularization term to the loss function that encourages convergence to the true annotator confusion matrix. We provide a theoretical argument as to how the regularization is essential to our approach both for the case of single annotator and multiple annotators. Despite the simplicity of the idea, experiments on image classification tasks with both simulated and real labels show that our method either outperforms or performs on par with the state-of-the-art methods and is capable of estimating the skills of annotators even with a single label available per image.

annotator, label noise, noisy label, (15 more...)

arXiv.org Machine Learning

1902.0368

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Introduction to "Advances in Financial Machine Learning" by Lopez de Prado

#artificialintelligenceFeb-9-2019, 09:06:54 GMT

Machine learning is a buzzword often thrown about when discussing the future of finance and the world. You may have heard of neural networks solving problems in facial recognition, language processing, and even financial markets, yet without much explanation. It is easy to view this field as a black box, a magic machine that somehow produces solutions, but nobody knows why it works. It is true that machine learning techniques (neural networks in particular) pick up on obscure and hard to explain features, however there is more room for research, customization, and analysis than may first appear. Today we'll be discussing at a high level the various factors to be considered when researching investing through the lens of machine learning. The contents of this notebook and further discussions on this topic are heavily inspired by Marcos Lopez de Prado's book Advances in Financial Machine Learning.

artificial intelligence, financial machine learning, variance, (15 more...)

#artificialintelligence

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Add feedback

Distance metric learning based on structural neighborhoods for dimensionality reduction and classification performance improvement

Ghods, Mostafa Razavi, Moattar, Mohammad Hossein, Forghani, Yahya

arXiv.org Machine LearningFeb-9-2019

Distance metric learning can be viewed as one of the fundamental interests in pattern recognition and machine learning, which plays a pivotal role in the performance of many learning methods. One of the effective methods in learning such a metric is to learn it from a set of labeled training samples. The issue of data imbalance is the most important challenge of recent methods. This research tries not only to preserve the local structures but also covers the issue of imbalanced datasets. To do this, the proposed method first tries to extract a low dimensional manifold from the input data. Then, it learns the local neighborhood structures and the relationship of the data points in the ambient space based on the adjacencies of the same data points on the embedded low dimensional manifold. Using the local neighborhood relationships extracted from the manifold space, the proposed method learns the distance 1 metric in a way which minimizes the distance between similar data and maximizes their distance from the dissimilar data points. The evaluations of the proposed method on numerous datasets from the UCI repository of machine learning, and also the KDDCup98 dataset as the most imbalance dataset, justify the supremacy of the proposed approach in comparison with other approaches especially when the imbalance factor is high.

dataset, distance metric learning, neighborhood, (8 more...)

arXiv.org Machine Learning

1902.03453

Country:

Asia > Middle East > Iran > Razavi Khorasan Province > Mashhad (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Inverse Projection Representation and Category Contribution Rate for Robust Tumor Recognition

Yang, Xiao-Hui, Tian, Li, Chen, Yun-Mei, Yang, Li-Jun, Xu, Shuang, Wu, Wen-Ming

arXiv.org Machine LearningFeb-9-2019

Sparse representation based classification (SRC) methods have achieved remarkable results. SRC, however, still suffer from requiring enough training samples, insufficient use of test samples and instability of representation. In this paper, a stable inverse projection representation based classification (IPRC) is presented to tackle these problems by effectively using test samples. An IPR is firstly proposed and its feasibility and stability are analyzed. A classification criterion named category contribution rate is constructed to match the IPR and complete classification. Moreover, a statistical measure is introduced to quantify the stability of representation-based classification methods. Based on the IPRC technique, a robust tumor recognition framework is presented by interpreting microarray gene expression data, where a two-stage hybrid gene selection method is introduced to select informative genes. Finally, the functional analysis of candidate's pathogenicity-related genes is given. Extensive experiments on six public tumor microarray gene expression datasets demonstrate the proposed technique is competitive with state-of-the-art methods.

category, classification, dataset, (12 more...)

arXiv.org Machine Learning

doi: 10.1109/TCBB.2018.2886334

1902.0351

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Henan Province (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Community detection of survey responses based on Pearson correlation coefficient with Neo4j

#artificialintelligenceFeb-8-2019, 22:05:27 GMT

Just a few days ago a new version of Neo4j graph algorithms plugin was released. With the new release come new algorithms and Pearson correlation algorithm is one of them. To demonstrate how to use Pearson correlation algorithm in Neo4j we will use the data from "Young People Survey" Kaggle dataset made available by Miroslav Sabo. It contains results of 1010 filled out surveys with questions ranging from music preferences, hobbies & interests to phobias. The nice thing about using Pearson correlation in scoring scenarios is that it takes into account when voters are generally more inclined to give higher or lower scores as it compares each score to the average score of the user.

artificial intelligence, correlation, machine learning, (12 more...)

#artificialintelligence

Genre: Questionnaire & Opinion Survey (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Twitter Still Can't Keep Up With Its Flood of Junk Accounts, Study Finds

WIREDFeb-8-2019, 13:46:35 GMT

Since the world learned of state-sponsored campaigns to spread disinformation on social media and sway the 2016 election, Twitter has scrambled to rein in the bots and trolls polluting its platform. But when it comes to the larger problem of automated accounts on Twitter designed to spread spam and scams, inflate follower counts, and game trending topics, one study argues that the company still isn't keeping up with the deluge of garbage and abuse. In fact, the paper's two researchers write that with a machine learning approach they developed themselves, they could identify abusive accounts in far greater volumes and faster than Twitter does--often flagging the accounts months before Twitter spotted and banned them. In an 16-month study of 1.5 billion tweets, Zubair Shafiq, a computer science professor at the University of Iowa, and his graduate student Shehroze Farooqi, identified more than 167,000 apps using Twitter's API to automate bot accounts that spread tens of millions of tweets pushing spam, links to malware, and astroturfing campaigns. They write that more than 60 percent of the time, Twitter waited for those apps to send more than 100 tweets before identifying them as abusive; the researchers' own detection method had flagged the vast majority of the malicious apps after just a handful of tweets.

artificial intelligence, machine learning, social media, (20 more...)

WIRED

Country: North America > United States > Iowa (0.29)

Genre: Research Report > New Finding (0.90)

Industry:

Information Technology > Services (0.90)
Government > Voting & Elections (0.55)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Add feedback

'I nearly aborted my baby because of an unreliable test'

BBC NewsFeb-8-2019, 07:06:34 GMT

When Claire Bell became pregnant she paid for a test that would indicate whether the baby had Down's Syndrome - and agreed to be screened for some other rare conditions at the same time. Not long afterwards, writes the BBC's Charlotte Hayward, she received what appeared to be terrible news. For five years, Claire Bell's husband was treated for two types of cancer. When it finally came to an end the couple decided to try having a baby through IVF, using some sperm her husband had had frozen and stored before he had chemotherapy. On the first round, at the age of 41, she became pregnant - and felt incredibly lucky. "It was this miraculous pregnancy," she says.

artificial intelligence, machine learning, turner syndrome, (16 more...)

BBC News

Country: Europe > United Kingdom (0.71)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.90)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.70)
Health & Medicine > Therapeutic Area > Genetic Disease (0.57)

Technology:

Information Technology > Communications > Social Media (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

Add feedback

Link Prediction via Higher-Order Motif Features

Abuoda, Ghadeer, Morales, Gianmarco De Francisci, Aboulnaga, Ashraf

arXiv.org Machine LearningFeb-8-2019

Link prediction requires predicting which new links are likely to appear in a graph. Being able to predict unseen links with good accuracy has important applications in several domains such as social media, security, transportation, and recommendation systems. A common approach is to use features based on the common neighbors of an unconnected pair of nodes to predict whether the pair will form a link in the future. In this paper, we present an approach for link prediction that relies on higher-order analysis of the graph topology, well beyond common neighbors. We treat the link prediction problem as a supervised classification problem, and we propose a set of features that depend on the patterns or motifs that a pair of nodes occurs in. By using motifs of sizes 3, 4, and 5, our approach captures a high level of detail about the graph topology within the neighborhood of the pair of nodes, which leads to a higher classification accuracy. In addition to proposing the use of motif-based features, we also propose two optimizations related to constructing the classification dataset from the graph. First, to ensure that positive and negative examples are treated equally when extracting features, we propose adding the negative examples to the graph as an alternative to the common approach of removing the positive ones. Second, we show that it is important to control for the shortest-path distance when sampling pairs of nodes to form negative examples, since the difficulty of prediction varies with the shortest-path distance. We experimentally demonstrate that using off-the-shelf classifiers with a well constructed classification dataset results in up to 10 percentage points increase in accuracy over prior topology-based and feature learning methods.

accuracy, graph, motif, (16 more...)

arXiv.org Machine Learning

1902.06679

Country:

Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Asia > China (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry: Information Technology (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Machine learning and chord based feature engineering for genre prediction in popular Brazilian music

Wundervald, Bruna D., Zeviani, Walmes M.

arXiv.org Machine LearningFeb-8-2019

Music genre can be hard to describe: many factors are involved, such as style, music technique, and historical context. Some genres even have overlapping characteristics. Looking for a better understanding of how music genres are related to musical harmonic structures, we gathered data about the music chords for thousands of popular Brazilian songs. Here, 'popular' does not only refer to the genre named MPB (Brazilian Popular Music) but to nine different genres that were considered particular to the Brazilian case. The main goals of the present work are to extract and engineer harmonically related features from chords data and to use it to classify popular Brazilian music genres towards establishing a connection between harmonic relationships and Brazilian genres. We also emphasize the generalisation of the method for obtaining the data, allowing for the replication and direct extension of this work. Our final model is a combination of multiple classification trees, also known as the random forest model. We found that features extracted from harmonic elements can satisfactorily predict music genre for the Brazilian case, as well as features obtained from the Spotify API. The variables considered in this work also give an intuition about how they relate to the genres.

chord, information, music genre, (16 more...)

arXiv.org Machine Learning

1902.03283

Country:

South America > Brazil (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
Europe > Portugal > Braga > Braga (0.04)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)

Add feedback

An analytic formulation for positive-unlabeled learning via weighted integral probability metric

Kwon, Yongchan, Kim, Wonyoung, Sugiyama, Masashi, Paik, Myunghee Cho

arXiv.org Machine LearningFeb-8-2019

We consider the problem of learning a binary classifier from only positive and unlabeled observations (PU learning). Although recent research in PU learning has succeeded in showing theoretical and empirical performance, most existing algorithms need to solve either a convex or a non-convex optimization problem and thus are not suitable for large-scale datasets. In this paper, we propose a simple yet theoretically grounded PU learning algorithm by extending the previous work proposed for supervised binary classification (Sriperumbudur et al., 2012). The proposed PU learning algorithm produces a closed-form classifier when the hypothesis space is a closed ball in reproducing kernel Hilbert space. In addition, we establish upper bounds of the estimation error and the excess risk. The obtained estimation error bound is sharper than existing results and the excess risk bound does not rely on an approximation error term. To the best of our knowledge, we are the first to explicitly derive the excess risk bound in the field of PU learning. Finally, we conduct extensive numerical experiments using both synthetic and real datasets, demonstrating improved accuracy, scalability, and robustness of the proposed algorithm.

algorithm, dataset, probability, (16 more...)

arXiv.org Machine Learning

1901.09503

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback