AITopics

2002.02883

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceFeb-10-2020, 07:11:02 GMT

The Basics: evaluating classifiers

Judging a classification model feels like it should be an easier task than judging a regression. After all, your prediction from a classification model can only either be right or wrong, while a prediction from a regression model can be more or less wrong, can have any level of error, high or low. Yet, judging a classification is not as simple as it may seem. There's more than one way for a classification to be right or to be wrong, and multiple ways to combine the different ways to be right and wrong into a unified metric. Of course, all these different metrics have different, frequently unintuitive names -- precision, recall, F1, ROC curves -- making the process seem a little forbidding from the outside.

heart attack, prediction, threshold, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

#artificialintelligenceFeb-10-2020, 03:32:56 GMT

How to solve 90% of NLP problems: a step-by-step guide

How you can apply the 5 W's and H to Text Data! Whether you are an established company or working to launch a new service, you can always leverage text data to validate, improve, and expand the functionalities of your product. The science of extracting meaning and learning from text data is an active topic of research called Natural Language Processing (NLP). NLP produces new and exciting results on a daily basis, and is a very large field. While many NLP papers and tutorials exist online, we have found it hard to find guidelines and tips on how to approach these problems efficiently from the ground up.

classifier, prediction, tweet, (16 more...)

Country:

North America > United States (0.04)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Das, Subhro, Lade, Prasanth, Srinivasan, Soundar

Model adaptation and unsupervised learning with non-stationary batch data under smooth concept drift

arXiv.org Machine LearningFeb-10-2020

Most predictive models assume that training and test data are generated from a stationary process. However, this assumption does not hold true in practice. In this paper, we consider the scenario of a gradual concept drift due to the underlying non-stationarity of the data source. While previous work has investigated this scenario under a supervised-learning and adaption conditions, few have addressed the common, real-world scenario when labels are only available during training. We propose a novel, iterative algorithm for unsupervised adaptation of predictive models. We show that the performance of our batch adapted prediction algorithm is better than that of its corresponding unadapted version. The proposed algorithm provides similar (or better, in most cases) performance within significantly less run time compared to other state of the art methods.

adaptation, algorithm, concept drift, (16 more...)

2002.04094

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Roshanfekr, Saeideh, Esmaeili, Shahriar, Ataeian, Hassan, Khas, Neda Maleki, Amiri, Ali

UGRWO-Sampling: A modified random walk under-sampling approach based on graphs to imbalanced data classification

arXiv.org Machine LearningFeb-9-2020

In this paper, we propose a new RWO-Sampling (Random Walk Over-Sampling) based on graphs for imbalanced datasets. In this method, two figures based on under-sampling and over-sampling methods are introduced to keep the proximity information, which is robust to noises and outliers. After the construction of the first graph on minority class, RWO-Sampling will be implemented on selected samples, and the rest of them will remain unchanged. The second graph is constructed for the majority class, and the samples in a low-density area (outliers) are removed. In the proposed method, examples of the majority class in a high-density area are selected, and the rest of them are eliminated. Furthermore, utilizing RWO-sampling, the boundary of minority class is increased though, the outliers are not raised. This method is tested, and the number of evaluation measures is compared to previous methods on nine continuous attribute datasets with different over-sampling rates. The experimental results were an indicator of the high efficiency and flexibility of the proposed method for the classification of imbalanced data.

classifier, dataset, ugrwo-sampling, (15 more...)

2002.03521

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Iran > Zanjan Province > Zanjan (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Machine LearningFeb-9-2020

Out-of-Distribution Detection with Distance Guarantee in Deep Generative Models

Zhang, Yufeng, Liu, Wanwei, Chen, Zhenbang, Wang, Ji, Liu, Zhiming, Li, Kenli, Wei, Hongmei, Chen, Zuoning

Recent research has shown that it is challenging to detect out-of-distribution (OOD) data in deep generative models including flow-based models and variational autoencoders (VAEs). In this paper, we prove a theorem that, for a well-trained flow-based model, the distance between the distribution of representations of an OOD dataset and prior can be large enough, as long as the distance between the distributions of the training dataset and the OOD dataset is large enough. Furthermore, our observation shows that, for flow-based model and VAE with factorized prior, the representations of OOD datasets are more correlated than that of the training dataset. Based on our theorem and observation, we propose detecting OOD data according to the total correlation of representations in flow-based model and VAE. Experimental results show that our method can achieve nearly 100\% AUROC for all the widely used benchmarks and has robustness against data manipulation. While the state-of-the-art method performs not better than random guessing for challenging problems and can be fooled by data manipulation in almost all cases.

dataset, fashionmnist, representation, (16 more...)

2002.03328

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hunan Province > Changsha (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

Wang, Sen, Chang, J. Morris

Privacy-Preserving Image Classification in the Local Setting

arXiv.org Machine LearningFeb-8-2020

Image data has been greatly produced by individuals and commercial vendors in the daily life, and it has been used across various domains, like advertising, medical and traffic analysis. Recently, image data also appears to be greatly important in social utility, like emergency response. However, the privacy concern becomes the biggest obstacle that prevents further exploration of image data, due to that the image could reveal sensitive information, like the personal identity and locations. The recent developed Local Differential Privacy (LDP) brings us a promising solution, which allows the data owners to randomly perturb their input to provide the plausible deniability of the data before releasing. In this paper, we consider a two-party image classification problem, in which data owners hold the image and the untrustworthy data user would like to fit a machine learning model with these images as input. To protect the image privacy, we propose to locally perturb the image representation before revealing to the data user. Subsequently, we analyze how the perturbation satisfies {\epsilon}-LDP and affect the data utility regarding count-based and distance-based machine learning algorithm, and propose a supervised image feature extractor, DCAConv, which produces an image representation with scalable domain size. Our experiments show that DCAConv could maintain a high data utility while preserving the privacy regarding multiple image benchmark datasets.

dcaconv, privacy, representation, (17 more...)

2002.03261

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Florida (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

#artificialintelligenceFeb-7-2020, 18:39:43 GMT

Reading a confusion matrix

What is a confusion matrix? A confusion matrix is a method of visualizing classification results. You had built a classification model that predicts some values on the test set and you also have some actual values for your target variable to compare with. Confusion matrix will show you if your predictions match the reality and how do they math in more detail. The confusion matrix below shows predicted versus actual values and gives names to classification pairs: true positives, true negatives, false negatives, and false positives.

confusion matrix

Industry: Health & Medicine (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

#artificialintelligenceFeb-7-2020, 01:35:44 GMT

Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study

Mammography is the current standard for breast cancer screening. This study aimed to develop an artificial intelligence (AI) algorithm for diagnosis of breast cancer in mammography, and explore whether it could benefit radiologists by improving accuracy of diagnosis.

artificial intelligence, cancer detection and false-positive recall, radiologist, (11 more...)

Country:

North America > United States (0.09)
Asia > South Korea (0.09)
Europe > United Kingdom (0.06)

Genre: Research Report > Experimental Study (0.50)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Applied AI (0.68)

Manchanda, Saurav, Yadav, Pranjul, Doan, Khoa, Keerthi, S. Sathiya

Targeted display advertising: the case of preferential attachment

arXiv.org Machine LearningFeb-7-2020

An average adult is exposed to hundreds of digital advertisements daily (https://www.mediadynamicsinc.com/uploads/files/PR092214-Note-only-150-Ads-2mk.pdf), making the digital advertisement industry a classic example of a big-data-driven platform. As such, the ad-tech industry relies on historical engagement logs (clicks or purchases) to identify potentially interested users for the advertisement campaign of a partner (a seller who wants to target users for its products). The number of advertisements that are shown for a partner, and hence the historical campaign data available for a partner depends upon the budget constraints of the partner. Thus, enough data can be collected for the high-budget partners to make accurate predictions, while this is not the case with the low-budget partners. This skewed distribution of the data leads to "preferential attachment" of the targeted display advertising platforms towards the high-budget partners. In this paper, we develop "domain-adaptation" approaches to address the challenge of predicting interested users for the partners with insufficient data, i.e., the tail partners. Specifically, we develop simple yet effective approaches that leverage the similarity among the partners to transfer information from the partners with sufficient data to cold-start partners, i.e., partners without any campaign data. Our approaches readily adapt to the new campaign data by incremental fine-tuning, and hence work at varying points of a campaign, and not just the cold-start. We present an experimental analysis on the historical logs of a major display advertising platform (https://www.criteo.com/). Specifically, we evaluate our approaches across 149 partners, at varying points of their campaigns. Experimental results show that the proposed approaches outperform the other "domain-adaptation" approaches at different time points of the campaigns.

advertisement, prediction, representation, (17 more...)

2002.02879

Country:

North America > United States > Minnesota (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Ireland (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Marketing (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)