AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Explaining Black-box Android Malware Detection

Melis, Marco, Maiorca, Davide, Biggio, Battista, Giacinto, Giorgio, Roli, Fabio

arXiv.org Machine LearningMar-9-2018

Machine-learning models have been recently used for detecting malicious Android applications, reporting impressive performances on benchmark datasets, even when trained only on features statically extracted from the application, such as system calls and permissions. However, recent findings have highlighted the fragility of such in-vitro evaluations with benchmark datasets, showing that very few changes to the content of Android malware may suffice to evade detection. How can we thus trust that a malware detector performing well on benchmark data will continue to do so when deployed in an operating environment? To mitigate this issue, the most popular Android malware detectors use linear, explainable machine-learning models to easily identify the most influential features contributing to each decision. In this work, we generalize this approach to any black-box machine- learning model, by leveraging a gradient-based approach to identify the most influential local features. This enables using nonlinear models to potentially increase accuracy without sacrificing interpretability of decisions. Our approach also highlights the global characteristics learned by the model to discriminate between benign and malware applications. Finally, as shown by our empirical analysis on a popular Android malware detection task, it also helps identifying potential vulnerabilities of linear and nonlinear models against adversarial manipulations.

android, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1803.03544

Country: Europe (0.46)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Add feedback

Influence of the Event Rate on Discrimination Abilities of Bankruptcy Prediction Models

Zhang, Lili, Priestley, Jennifer, Ni, Xuelei

arXiv.org Machine LearningMar-9-2018

In bankruptcy prediction, the proportion of events is very low, which is often oversampled to eliminate this bias. In this paper, we study the influence of the event rate on discrimination abilities of bankruptcy prediction models. First the statistical association and significance of public records and firmographics indicators with the bankruptcy were explored. Then the event rate was oversampled from 0.12% to 10%, 20%, 30%, 40%, and 50%, respectively. Seven models were developed, including Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine, Bayesian Network, and Neural Network. Under different event rates, models were comprehensively evaluated and compared based on Kolmogorov-Smirnov Statistic, accuracy, F1 score, Type I error, Type II error, and ROC curve on the hold-out dataset with their best probability cut-offs. Results show that Bayesian Network is the most insensitive to the event rate, while Support Vector Machine is the most sensitive.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Machine Learning

doi: 10.5121/ijdms.2018.10101

1803.03756

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.90)

Industry: Banking & Finance (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.72)

Add feedback

Joint Estimation and Inference for Data Integration Problems based on Multiple Multi-layered Gaussian Graphical Models

Majumdar, Subhabrata, Michailidis, George

arXiv.org Machine LearningMar-8-2018

Aberrations in complex biological systems develop in the background of diverse genetic and environmental factors and are associated with multiple complex molecular events. These include changes in the genome, transcriptome, proteome and metabolome, as well as epigenetic effects. Advances in high-throughput profiling techniques have enabled a systematic and comprehensive exploration of the genetic and epigenetic basis of various diseases, including cancer (Kaushik et al., 2016; Lee et al., 2016), diabetes (Sas et al., 2018; Yuan et al., 2014), chronic kidney disease (Atzler et al., 2014), etc. Further, such multi-Omics collections have become available for patients belonging to different, but related disease subtypes, with The Cancer Genome Atlas (TCGA: Tomczak et al. (2015)) being a prototypical one. Hence, there is an increasing need for models that can integrate such complex data both vertically across multiple modalities and horizontally across different disease subtypes. Figure 1 provides a schematic representation of the horizontal and vertical structure of such heterogeneous multi-modal Omics data as outlined above.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1803.03348

Country: North America > United States (0.27)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

A Bayesian and Machine Learning approach to estimating Influence Model parameters for IM-RO

Lawrence, Trisha

arXiv.org Machine LearningMar-8-2018

The rise of Online Social Networks (OSNs) has caused an insurmountable amount of interest from advertisers and researchers seeking to monopolize on its features. Researchers aim to develop strategies for determining how information is propagated among users within an OSN that is captured by diffusion or influence models. We consider the influence models for the IM-RO problem, a novel formulation to the Influence Maximization (IM) problem based on implementing Stochastic Dynamic Programming (SDP). In contrast to existing approaches involving influence spread and the theory of submodular functions, the SDP method focuses on optimizing clicks and ultimately revenue to advertisers in OSNs. Existing approaches to influence maximization have been actively researched over the past decade, with applications to multiple fields, however, our approach is a more practical variant to the original IM problem. In this paper, we provide an analysis on the influence models of the IM-RO problem by conducting experiments on synthetic and real-world datasets. We propose a Bayesian and Machine Learning approach for estimating the parameters of the influence models for the (Influence Maximization- Revenue Optimization) IM-RO problem. We present a Bayesian hierarchical model and implement the well-known Naive Bayes classifier (NBC), Decision Trees classifier (DTC) and Random Forest classifier (RFC) on three real-world datasets. Compared to previous approaches to estimating influence model parameters, our strategy has the great advantage of being directly implementable in standard software packages such as WinBUGS/OpenBUGS/JAGS and Apache Spark. We demonstrate the efficiency and usability of our methods in terms of spreading information and generating revenue for advertisers in the context of OSNs.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

1803.03191

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Penalizing Unfairness in Binary Classification

Bechavod, Yahav, Ligett, Katrina

arXiv.org Machine LearningMar-8-2018

We present a new approach for mitigating unfairness in learned classifiers. In particular, we focus on binary classification tasks over individuals from two populations, where, as our criterion for fairness, we wish to achieve similar false positive rates in both populations, and similar false negative rates in both populations. As a proof of concept, we implement our approach and empirically evaluate its ability to achieve both fairness and accuracy, using datasets from the fields of criminal risk assessment, credit, lending, and college admissions.

artificial intelligence, classifier, machine learning, (15 more...)

arXiv.org Machine Learning

1707.00044

Country:

North America > United States (1.00)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.84)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.49)
Education > Educational Setting > Higher Education (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

WWE Fastlane 2018: Predictions, Matches For Last 'SmackDown' PPV Before WrestleMania 34

International Business TimesMar-7-2018, 18:08:57 GMT

It isn't likely that any championships will change hands, though the matches in Columbus, Ohio will help advance WrestleMania storylines. Below are predictions for every match on the WWE Fastlane card. From the moment Shinsuke Nakamura won the Royal Rumble, it's been clear that he would face AJ Styles at WrestleMania in a rematch from Wrestle Kingdom 10. That means Styles will find a way to retain the title. Another WrestleMania feud could be born from the championship match, though don't be surprised to see Owens and Zayn team up after having their differences over the past few weeks.

artificial intelligence, machine learning, prediction, (10 more...)

International Business Times

Country: North America > United States > Ohio > Franklin County > Columbus (0.26)

Industry: Leisure & Entertainment > Sports > Martial Arts (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.44)

Add feedback

A bag-to-class divergence approach to multiple-instance learning

Møllersen, Kajsa, Hardeberg, Jon Yngve, Godtliebsen, Fred

arXiv.org Machine LearningMar-7-2018

In multi-instance (MI) learning, each object (bag) consists of multiple feature vectors (instances), and is most commonly regarded as a set of points in a multidimensional space. A different viewpoint is that the instances are realisations of random vectors with corresponding probability distribution, and that a bag is the distribution, not the realisations. In MI classification, each bag in the training set has a class label, but the instances are unlabelled. By introducing the probability distribution space to bag-level classification problems, dissimilarities between probability distributions (divergences) can be applied. The bag-to-bag Kullback-Leibler information is asymptotically the best classifier, but the typical sparseness of MI training sets is an obstacle. We introduce bag-to-class divergence to MI learning, emphasising the hierarchical nature of the random vectors that makes bags from the same class different. We propose two properties for bag-to-class divergences, and an additional property for sparse training sets.

artificial intelligence, divergence, machine learning, (17 more...)

arXiv.org Machine Learning

1803.02782

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Stochastic Block Models with Multiple Continuous Attributes

Stanley, Natalie, Bonacci, Thomas, Kwitt, Roland, Niethammer, Marc, Mucha, Peter J.

arXiv.org Machine LearningMar-7-2018

Abstract--The stochastic block model (SBM) is a probabilistic model for community structure in networks. Typically, only the adjacency matrix is used to perform SBM parameter inference. In this paper, we consider circumstances in which nodes have an associated vector of continuous attributes that are also used to learn the node-to-community assignments and corresponding SBM parameters. While this assumption is not realistic for every application, our model assumes that the attributes associated with the nodes in a network's community can be described by a common multivariate Gaussian model. In this augmented, attributed SBM, the objective is to simultaneously learn the SBM connectivity probabilities with the multivariate Gaussian parameters describing each community. While there are recent examples in the literature that combine connectivity and attribute information to inform community detection, our model is the first augmented stochastic block model to handle multiple continuous attributes. This provides the flexibility in biological data to, for example, augment connectivity information with continuous measurements from multiple experimental modalities. Because the lack of labeled network data often makes community detection results difficult to validate, we highlight the usefulness of our model for two network prediction tasks: link prediction and collaborative filtering. As a result of fitting this attributed stochastic block model, one can predict the attribute vector or connectivity patterns for a new node in the event of the complementary source of information (connectivity or attributes, respectively). We also highlight two biological examples where the attributed stochastic block model provides satisfactory performance in the link prediction and collaborative filtering tasks. In various applications, each node in a network is equipped with additional information (or particular attributes) that was not implicitly taken into account in the construction of the network.

data mining, machine learning, node, (18 more...)

arXiv.org Machine Learning

1803.02726

Country:

Europe (0.68)
North America > United States > North Carolina (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Optimal Subsampling for Large Sample Logistic Regression

Wang, HaiYing, Zhu, Rong, Ma, Ping

arXiv.org Machine LearningMar-7-2018

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where statistical leverage scores are often used to define subsampling probabilities. In this paper, we propose fast subsampling algorithms to efficiently approximate the maximum likelihood estimate in logistic regression. We first establish consistency and asymptotic normality of the estimator from a general subsampling algorithm, and then derive optimal subsampling probabilities that minimize the asymptotic mean squared error of the resultant estimator. An alternative minimization criterion is also proposed to further reduce the computational cost. The optimal subsampling probabilities depend on the full data estimate, so we develop a two-step algorithm to approximate the optimal subsampling procedure. This algorithm is computationally efficient and has a significant reduction in computing time compared to the full data approach. Consistency and asymptotic normality of the estimator from a two-step algorithm are also established. Synthetic and real data sets are used to evaluate the practical performance of the proposed method.

artificial intelligence, machine learning, mle, (18 more...)

arXiv.org Machine Learning

1702.01166

Country:

Europe (0.67)
North America > United States > Connecticut (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

40 Interview Questions asked at Startups in Machine Learning / Data Science

@machinelearnbotMar-6-2018, 19:18:54 GMT

This article was posted by Manish Saraswat on Analytics Vidhya. Manish who works in marketing and Data Science at Analytics Vidhya believes that education can change this world. R, Data Science and Machine Learning keep him busy. Machine learning and data science are being looked as the drivers of the next industrial revolution happening in the world today. This also means that there are numerous exciting startups looking for data scientists.

artificial intelligence, data science, machine learning, (14 more...)

@machinelearnbot

Industry: Health & Medicine > Therapeutic Area > Oncology (0.33)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.78)

Add feedback