AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

Probabilistic Modeling for Novelty Detection with Applications to Fraud Identification

arXiv.org Machine LearningMar-5-2019

Novelty detection is the unsupervised problem of identifying anomalies in test data which significantly differ from the training set. Novelty detection is one of the classic challenges in Machine Learning and a core component of several research areas such as fraud detection, intrusion detection, medical diagnosis, data cleaning, and fault prevention. While numerous algorithms were designed to address this problem, most methods are only suitable to model continuous numerical data. Tackling datasets composed of mixed-type features, such as numerical and categorical data, or temporal datasets describing discrete event sequences is a challenging task. In addition to the supported data types, the key criteria for efficient novelty detection methods are the ability to accurately dissociate novelties from nominal samples, the interpretability, the scalability and the robustness to anomalies located in the training data. In this thesis, we investigate novel ways to tackle these issues. In particular, we propose (i) an experimental comparison of novelty detection methods for mixed-type data (ii) an experimental comparison of novelty detection methods for sequence data, (iii) a probabilistic nonparametric novelty detection method for mixed-type data based on Dirichlet process mixtures and exponential-family distributions and (iv) an autoencoder-based novelty detection model with encoder/decoder modelled as deep Gaussian processes.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1903.0173

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > San Francisco County > San Francisco (0.13)
Asia > Middle East > Jordan (0.04)
(12 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Consumer Products & Services (0.92)
Transportation > Air (0.92)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)
(3 more...)

Add feedback

Automatic Classification of Pathology Reports using TF-IDF Features

Kalra, Shivam, Li, Larry, Tizhoosh, Hamid R.

arXiv.org Machine LearningMar-5-2019

A Pathology report is arguably one of the most important documents in medicine containing interpretive information about the visual findings from the patient's biopsy sample. Each pathology report has a retention period of up to 20 years after the treatment of a patient. Cancer registries process and encode high volumes of free-text pathology reports for surveillance of cancer and tumor diseases all across the world. In spite of their extremely valuable information they hold, pathology reports are not used in any systematic way to facilitate computational pathology. Therefore, in this study, we investigate automated machine-learning techniques to identify/predict the primary diagnosis (based on ICD-O code) from pathology reports. We performed experiments by extracting the TF-IDF features from the reports and classifying them using three different methods---SVM, XGBoost, and Logistic Regression. We constructed a new dataset with 1,949 pathology reports arranged into 37 ICD-O categories, collected from four different primary sites, namely lung, kidney, thymus, and testis. The reports were manually transcribed into text format after collecting them as PDF files from NCI Genomic Data Commons public dataset. We subsequently pre-processed the reports by removing irrelevant textual artifacts produced by OCR software. The highest classification accuracy we achieved was 92\% using XGBoost classifier on TF-IDF feature vectors, the linear SVM scored 87\% accuracy. Furthermore, the study shows that TF-IDF vectors are suitable for highlighting the important keywords within a report which can be helpful for the cancer research and diagnostic workflow. The results are encouraging in demonstrating the potential of machine learning methods for classification and encoding of pathology reports.

artificial intelligence, machine learning, pathology report, (16 more...)

arXiv.org Machine Learning

1903.07406

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.55)
Research Report > Experimental Study (0.34)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.97)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

QuickStop: A Markov Optimal Stopping Approach for Quickest Misinformation Detection

Wei, Honghao, Kang, Xiaohan, Wang, Weina, Ying, Lei

arXiv.org Machine LearningMar-4-2019

This paper combines data-driven and model-driven methods for real-time misinformation detection. Our algorithm, named QuickStop, is an optimal stopping algorithm based on a probabilistic information spreading model obtained from labeled data. The algorithm consists of an offline machine learning algorithm for learning the probabilistic information spreading model and an online optimal stopping algorithm to detect misinformation. The online detection algorithm has both low computational and memory complexities. Our numerical evaluations with a real-world dataset show that QuickStop outperforms existing misinformation detection algorithms in terms of both accuracy and detection time (number of observations needed for detection). Our evaluations with synthetic data further show that QuickStop is robust to (offline) learning errors.

artificial intelligence, machine learning, misinformation, (16 more...)

arXiv.org Machine Learning

1903.04887

Genre: Research Report (0.40)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

The Fuzzy ROC

Parmigiani, Giovanni

arXiv.org Machine LearningMar-4-2019

The fuzzy ROC extends Receiver Operating Curve (ROC) visualization to the situation where some data points, falling in an indeterminacy region, are not classified. It addresses two challenges: definition of sensitivity and specificity bounds under indeterminacy; and visual summarization of the large number of possibilities arising from different choices of indeterminacy zones.

artificial intelligence, gray zone, machine learning, (18 more...)

arXiv.org Machine Learning

1903.01868

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.52)

Add feedback

Learning STRIPS Action Models with Classical Planning

Aineto, Diego, Jiménez, Sergio, Onaindia, Eva

arXiv.org Artificial IntelligenceMar-4-2019

This paper presents a novel approach for learning STRIPS action models from examples that compiles this inductive learning task into a classical planning task. Interestingly, the compilation approach is flexible to different amounts of available input knowledge; the learning examples can range from a set of plans (with their corresponding initial and final states) to just a pair of initial and final states (no intermediate action or state is given). Moreover, the compilation accepts partially specified action models and it can be used to validate whether the observation of a plan execution follows a given STRIPS action model, even if this model is not fully specified.

action model, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1903.01153

Country: Europe > Spain (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Fairness in Recommendation Ranking through Pairwise Comparisons

Beutel, Alex, Chen, Jilin, Doshi, Tulsee, Qian, Hai, Wei, Li, Wu, Yi, Heldt, Lukasz, Zhao, Zhe, Hong, Lichan, Chi, Ed H., Goodrow, Cristos

arXiv.org Artificial IntelligenceMar-2-2019

Recommender systems are one of the most pervasive applications of machine learning in industry, with many services using them to match users to products or information. As such it is important to ask: what are the possible fairness risks, how can we quantify them, and how should we address them? In this paper we offer a set of novel metrics for evaluating algorithmic fairness concerns in recommender systems. In particular we show how measuring fairness based on pairwise comparisons from randomized experiments provides a tractable means to reason about fairness in rankings from recommender systems. Building on this metric, we offer a new regularizer to encourage improving this metric during model training and thus improve fairness in the resulting rankings. We apply this pairwise regularization to a large-scale, production recommender system and show that we are able to significantly improve the system's pairwise fairness.

artificial intelligence, fairness, machine learning, (12 more...)

arXiv.org Artificial Intelligence

1903.0078

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)

Genre: Research Report > Experimental Study (0.74)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models (Extended Version)

Onwuzurike, Lucky, Mariconti, Enrico, Andriotis, Panagiotis, De Cristofaro, Emiliano, Ross, Gordon, Stringhini, Gianluca

arXiv.org Artificial IntelligenceMar-2-2019

As Android has become increasingly popular, so has malware targeting it, thus pushing the research community to propose different detection techniques. However, the constant evolution of the Android ecosystem, and of malware itself, makes it hard to design robust tools that can operate for long periods of time without the need for modifications or costly re-training. Aiming to address this issue, we set to detect malware from a behavioral point of view, modeled as the sequence of abstracted API calls. We introduce MaMaDroid, a static-analysis based system that abstracts the API calls performed by an app to their class, package, or family, and builds a model from their sequences obtained from the call graph of an app as Markov chains. This ensures that the model is more resilient to API changes and the features set is of manageable size. We evaluate MaMaDroid using a dataset of 8.5K benign and 35.5K malicious apps collected over a period of six years, showing that it effectively detects malware (with up to 0.99 F-measure) and keeps its detection capabilities for long periods of time (up to 0.87 F-measure two years after training). We also show that MaMaDroid remarkably outperforms DroidAPIMiner, a state-of-the-art detection system that relies on the frequency of (raw) API calls. Aiming to assess whether MaMaDroid's effectiveness mainly stems from the API abstraction or from the sequencing modeling, we also evaluate a variant of it that uses frequency (instead of sequences), of abstracted API calls. We find that it is not as accurate, failing to capture maliciousness when trained on malware samples that include API calls that are equally or more frequently used by benign apps.

artificial intelligence, machine learning, mamadroid, (19 more...)

arXiv.org Artificial Intelligence

1711.07477

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Catching Bugs Without Really Trying

#artificialintelligenceMar-1-2019, 17:31:36 GMT

Finding and fixing bugs is critical to delivering quality software. One-time new bugs are often introduced into a system when changes are uploaded to a software repository. Changes may be due to adding new features or possibly correcting existing bugs. Mozilla is planning on taking advantage of research being done by Ubisoft using artificial intelligence (AI) and machine learning (ML) to automatically catch software bugs when source code is committed to a software repository. The software is called CLEVER for Combining Levels of Bug Prevention and Resolution techniques.

artificial intelligence, clever, machine learning, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

Add feedback

8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset

#artificialintelligenceMar-1-2019, 06:56:03 GMT

Has this happened to you? You are working on your dataset. You create a classification model and get 90% accuracy immediately. You dive a little deeper and discover that 90% of the data belongs to one class. This is an example of an imbalanced dataset and the frustrating results it can cause.

artificial intelligence, data mining, machine learning, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.30)

Add feedback

8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset

#artificialintelligenceMar-1-2019, 06:56:02 GMT

artificial intelligence, data mining, machine learning, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.30)

Add feedback