AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

Learning Fast and Slow: PROPEDEUTICA for Real-time Malware Detection

Sun, Ruimin, Yuan, Xiaoyong, He, Pan, Zhu, Qile, Chen, Aokun, Gregio, Andre, Oliveira, Daniela, Li, Xiaolin

arXiv.org Machine LearningDec-4-2017

In this paper, we introduce and evaluate PROPEDEUTICA, a novel methodology and framework for efficient and effective real-time malware detection, leveraging the best of conventional machine learning (ML) and deep learning (DL) algorithms. In PROPEDEUTICA, all software processes in the system start execution subjected to a conventional ML detector for fast classification. If a piece of software receives a borderline classification, it is subjected to further analysis via more performance expensive and more accurate DL methods, via our newly proposed DL algorithm DEEPMALWARE. Further, we introduce delays to the execution of software subjected to deep learning analysis as a way to "buy time" for DL analysis and to rate-limit the impact of possible malware in the system. We evaluated PROPEDEUTICA with a set of 9,115 malware samples and 877 commonly used benign software samples from various categories for the Windows OS. Our results show that the false positive rate for conventional ML methods can reach 20%, and for modern DL methods it is usually below 6%. However, the classification time for DL can be 100X longer than conventional ML methods. PROPEDEUTICA improved the detection F1-score from 77.54% (conventional ML method) to 90.25%, and reduced the detection time by 54.86%. Further, the percentage of software subjected to DL analysis was approximately 40% on average. Further, the application of delays in software subjected to ML reduced the detection time by approximately 10%. Finally, we found and discussed a discrepancy between the detection accuracy offline (analysis after all traces are collected) and on-the-fly (analysis in tandem with trace collection). Our insights show that conventional ML and modern DL-based malware detectors in isolation cannot meet the needs of efficient and effective malware detection: high accuracy, low false positive rate, and short classification time.

artificial intelligence, machine learning, system call, (17 more...)

arXiv.org Machine Learning

1712.01145

Country: North America > United States (0.15)

Genre: Research Report > New Finding (0.85)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Where Classification Fails, Interpretation Rises

Nguyen, Chanh, Georgiev, Georgi, Ji, Yujie, Wang, Ting

arXiv.org Machine LearningDec-2-2017

An intriguing property of deep neural networks is their inherent vulnerability to adversarial inputs, which significantly hinders their application in security-critical domains. Most existing detection methods attempt to use carefully engineered patterns to distinguish adversarial inputs from their genuine counterparts, which however can often be circumvented by adaptive adversaries. In this work, we take a completely different route by leveraging the definition of adversarial inputs: while deceiving for deep neural networks, they are barely discernible for human visions. Building upon recent advances in interpretable models, we construct a new detection framework that contrasts an input's interpretation against its classification. We validate the efficacy of this framework through extensive experiments using benchmark datasets and attacks. We believe that this work opens a new direction for designing adversarial input detection methods.

artificial intelligence, arxiv preprint arxiv, machine learning, (17 more...)

arXiv.org Machine Learning

1712.00558

Country: North America > United States (0.29)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Distributional Equivalence and Structure Learning for Bow-free Acyclic Path Diagrams

Nowzohour, Christopher, Maathuis, Marloes H., Evans, Robin J., Bühlmann, Peter

arXiv.org Machine LearningDec-2-2017

We consider the problem of structure learning for bow-free acyclic path diagrams (BAPs). BAPs can be viewed as a generalization of linear Gaussian DAG models that allow for certain hidden variables. We present a first method for this problem using a greedy score-based search algorithm. We also prove some necessary and some sufficient conditions for distributional equivalence of BAPs which are used in an algorithmic ap- proach to compute (nearly) equivalent model structures. This allows us to infer lower bounds of causal effects. We also present applications to real and simulated datasets using our publicly available R-package.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1508.01717

Country: Europe > Austria (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Subject Selection on a Riemannian Manifold for Unsupervised Cross-subject Seizure Detection

Bolagh, Samaneh Nasiri Ghosheh, Clifford, Gari. D.

arXiv.org Machine LearningDec-1-2017

Inter-subject variability between individuals poses a challenge in inter-subject brain signal analysis problems. A new algorithm for subject-selection based on clustering covariance matrices on a Riemannian manifold is proposed. After unsupervised selection of the subsets of relevant subjects, data in a cluster is mapped to a tangent space at the mean point of covariance matrices in that cluster and an SVM classifier on labeled data from relevant subjects is trained. Experiment on an EEG seizure database shows that the proposed method increases the accuracy over state-of-the-art from 86.83% to 89.84% and specificity from 87.38% to 89.64% while reducing the false positive rate/hour from 0.8/hour to 0.77/hour.

artificial intelligence, machine learning, riemannian manifold, (14 more...)

arXiv.org Machine Learning

1712.00465

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Predicting Adolescent Suicide Attempts with Neural Networks

Bhat, Harish S., Goldman-Mellor, Sidra J.

arXiv.org Machine LearningDec-1-2017

Though suicide is a major public health problem in the US, machine learning methods are not commonly used to predict an individual's risk of attempting/committing suicide. In the present work, starting with an anonymized collection of electronic health records for 522,056 unique, California-resident adolescents, we develop neural network models to predict suicide attempts. We frame the problem as a binary classification problem in which we use a patient's data from 2006-2009 to predict either the presence (1) or absence (0) of a suicide attempt in 2010. After addressing issues such as severely imbalanced classes and the variable length of a patient's history, we build neural networks with depths varying from two to eight hidden layers. For test set observations where we have at least five ED/hospital visits' worth of data on a patient, our depth-4 model achieves a sensitivity of 0.703, specificity of 0.980, and AUC of 0.958.

artificial intelligence, machine learning, suicide attempt, (17 more...)

arXiv.org Machine Learning

1711.10057

Country: North America > United States > California > Merced County > Merced (0.14)

Genre: Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Implementing Machine Learning Using Python and Scikit-learn

#artificialintelligenceNov-30-2017, 11:45:24 GMT

For machine learning, you can also use these libraries to build learning models. However, doing so requires that you have a strong appreciation of the mathematical foundation for the various machine learning algorithms.

artificial intelligence, dataset, machine learning, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.33)

Add feedback

[D] Weighing softmax predictions based on the validation set confusion matrix, does it make sense? • r/MachineLearning

@machinelearnbotNov-30-2017, 03:05:09 GMT

Suppose I have a classification neuralnet for which I compute the confusion matrix on the validation set after my network has converged. What ways are there of using this matrix to reliably increase the accuracy on unseen data? I know of setting a per-class minimum confidence threshold. But would it make sense to reponder the softmax predictions knowing that some class A is often misclassified as B by the network etc...?

artificial intelligence, machine learning, softmax prediction, (3 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.74)

Add feedback

An Improved Naive Bayes Classifier-based Noise Detection Technique for Classifying User Phone Call Behavior

Sarker, Iqbal H., Kabir, Muhammad Ashad, Colman, Alan, Han, Jun

arXiv.org Machine LearningNov-30-2017

The presence of noisy instances in mobile phone data is a fundamental issue for classifying user phone call behavior (i.e., accept, reject, missed and outgoing), with many potential negative consequences. The classification accuracy may decrease and the complexity of the classifiers may increase due to the number of redundant training samples. To detect such noisy instances from a training dataset, researchers use naive Bayes classifier (NBC) as it identifies misclassified instances by taking into account independence assumption and conditional probabilities of the attributes. However, some of these misclassified instances might indicate usages behavioral patterns of individual mobile phone users. Existing naive Bayes classifier based noise detection techniques have not considered this issue and, thus, are lacking in classification accuracy. In this paper, we propose an improved noise detection technique based on naive Bayes classifier for effectively classifying users' phone call behaviors. In order to improve the classification accuracy, we effectively identify noisy instances from the training dataset by analyzing the behavioral patterns of individuals. We dynamically determine a noise threshold according to individual's unique behavioral patterns by using both the naive Bayes classifier and Laplace estimator. We use this noise threshold to identify noisy instances. To measure the effectiveness of our technique in classifying user phone call behavior, we employ the most popular classification algorithm (e.g., decision tree). Experimental results on the real phone call log dataset show that our proposed technique more accurately identifies the noisy instances from the training datasets that leads to better classification accuracy.

artificial intelligence, machine learning, probability, (14 more...)

arXiv.org Machine Learning

1710.04461

Country:

North America (0.46)
Oceania > Australia (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Learning Certifiably Optimal Rule Lists for Categorical Data

Angelino, Elaine, Larus-Stone, Nicholas, Alabi, Daniel, Seltzer, Margo, Rudin, Cynthia

arXiv.org Machine LearningNov-30-2017

We present the design and implementation of a custom discrete optimization technique for building rule lists over a categorical feature space. Our algorithm produces rule lists with optimal training performance, according to the regularized empirical risk, with a certificate of optimality. By leveraging algorithmic bounds, efficient data structures, and computational reuse, we achieve several orders of magnitude speedup in time and a massive reduction of memory consumption. We demonstrate that our approach produces optimal rule lists on practical problems in seconds. Our results indicate that it is possible to construct optimal sparse rule lists that are approximately as accurate as the COMPAS proprietary risk prediction tool on data from Broward County, Florida, but that are completely interpretable. This framework is a novel alternative to CART and other decision tree methods for interpretable modeling.

artificial intelligence, machine learning, rule list, (21 more...)

arXiv.org Machine Learning

1704.01701

Country:

North America > United States > California (0.45)
North America > United States > Florida > Broward County (0.24)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.45)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government (0.68)
Law > Criminal Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(3 more...)

Add feedback

Beyond Parity: Fairness Objectives for Collaborative Filtering

Yao, Sirui, Huang, Bert

arXiv.org Artificial IntelligenceNov-30-2017

We study fairness in collaborative-filtering recommender systems, which are sensitive to discrimination that exists in historical data. Biased data can lead collaborative-filtering methods to make unfair predictions for users from minority groups. We identify the insufficiency of existing fairness metrics and propose four new metrics that address different forms of unfairness. These fairness metrics can be optimized by adding fairness terms to the learning objective. Experiments on synthetic and real data show that our new metrics can better measure fairness than the baseline, and that the fairness objectives effectively help reduce unfairness.

artificial intelligence, machine learning, unfairness, (16 more...)

arXiv.org Artificial Intelligence

1705.08804

Country: North America > United States > Virginia (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Education (1.00)
Government > Regional Government (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback