AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data

Bullough, Benjamin L., Yanchenko, Anna K., Smith, Christopher L., Zipkin, Joseph R.

arXiv.org Machine LearningJul-25-2017

Each year, thousands of software vulnerabilities are discovered and reported to the public. Unpatched known vulnerabilities are a significant security risk. It is imperative that software vendors quickly provide patches once vulnerabilities are known and users quickly install those patches as soon as they are available. However, most vulnerabilities are never actually exploited. Since writing, testing, and installing software patches can involve considerable resources, it would be desirable to prioritize the remediation of vulnerabilities that are likely to be exploited. Several published research studies have reported moderate success in applying machine learning techniques to the task of predicting whether a vulnerability will be exploited. These approaches typically use features derived from vulnerability databases (such as the summary text describing the vulnerability) or social media posts that mention the vulnerability by name. However, these prior studies share multiple methodological shortcomings that inflate predictive power of these approaches. We replicate key portions of the prior work, compare their approaches, and show how selection of training and test data critically affect the estimated performance of predictive models. The results of this study point to important methodological considerations that should be taken into account so that results reflect real-world utility.

data mining, machine learning, vulnerability, (17 more...)

arXiv.org Machine Learning

doi: 10.1145/3041008.3041009

1707.08015

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Accelerating Permutation Testing in Voxel-wise Analysis through Subspace Tracking: A new plugin for SnPM

Gutierrez-Barragan, Felipe, Ithapu, Vamsi K., Hinrichs, Chris, Maumet, Camille, Johnson, Sterling C., Nichols, Thomas E., Singh, Vikas, ADNI, the

arXiv.org Machine LearningJul-24-2017

Permutation testing is a non-parametric method for obtaining the max null distribution used to compute corrected $p$-values that provide strong control of false positives. In neuroimaging, however, the computational burden of running such an algorithm can be significant. We find that by viewing the permutation testing procedure as the construction of a very large permutation testing matrix, $T$, one can exploit structural properties derived from the data and the test statistics to reduce the runtime under certain conditions. In particular, we see that $T$ is low-rank plus a low-variance residual. This makes $T$ a good candidate for low-rank matrix completion, where only a very small number of entries of $T$ ($\sim0.35\%$ of all entries in our experiments) have to be computed to obtain a good estimate. Based on this observation, we present RapidPT, an algorithm that efficiently recovers the max null distribution commonly obtained through regular permutation testing in voxel-wise analysis. We present an extensive validation on a synthetic dataset and four varying sized datasets against two baselines: Statistical NonParametric Mapping (SnPM13) and a standard permutation testing implementation (referred as NaivePT). We find that RapidPT achieves its best runtime performance on medium sized datasets ($50 \leq n \leq 200$), with speedups of 1.5x - 38x (vs. SnPM13) and 20x-1000x (vs. NaivePT). For larger datasets ($n \geq 200$) RapidPT outperforms NaivePT (6x - 200x) on all datasets, and provides large speedups over SnPM13 when more than 10000 permutations (2x - 15x) are needed. The implementation is a standalone toolbox and also integrated within SnPM13, able to leverage multi-core architectures when available.

artificial intelligence, machine learning, rapidpt, (16 more...)

arXiv.org Machine Learning

doi: 10.1016/j.neuroimage.2017.07.025

1703.01506

Country: North America > United States > California (0.28)

Genre:

Research Report > New Finding (0.89)
Research Report > Experimental Study (0.71)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Health Analytics: a systematic review of approaches to detect phenotype cohorts using electronic health records

Hiob, Norman, Lessmann, Stefan

arXiv.org Machine LearningJul-24-2017

The paper presents a systematic review of state-of-the-art approaches to identify patient cohorts using electronic health records. It gives a comprehensive overview of the most commonly de-tected phenotypes and its underlying data sets. Special attention is given to preprocessing of in-put data and the different modeling approaches. The literature review confirms natural language processing to be a promising approach for electronic phenotyping. However, accessibility and lack of natural language process standards for medical texts remain a challenge. Future research should develop such standards and further investigate which machine learning approaches are best suited to which type of medical data.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1707.07425

Country: North America > United States (0.47)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > Promising Solution (0.86)
Research Report > New Finding (0.69)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

Keys, Kevin L., Chen, Gary K., Lange, Kenneth

arXiv.org Machine LearningJul-24-2017

A genome-wide association study (GWAS) correlates marker variation with trait variation in a sample of individuals. Each study subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here we assume that subjects are unrelated and collected at random and that trait values are normally distributed or transformed to normality. Over the past decade, researchers have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with LASSO or MCP penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. This paper introduces the iterative hard thresholding (IHT) algorithm to the GWAS analysis of continuous traits. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoid supercomputing. We evaluate IHT performance on both simulated and real GWAS data and conclude that it reduces false positive and false negative rates while remaining competitive in computational time with penalized regression. Source code is freely available at https://github.com/klkeys/IHT.jl.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1608.01398

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > San Francisco County > San Francisco (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

WWE Battleground 2017: Live Stream Info, Start Time, For PPV Before SummerSlam

International Business TimesJul-22-2017, 15:40:06 GMT

Some of the final pieces for SummerSlam 2017 will fall into place Sunday night with the last "SmackDown Live" pay-per-view prior to the "Big Four" event. WWE Battleground 2017 will feature seven matches on the card with three championships on the line. Battleground 2017 is scheduled to start at 8 p.m. EDT, and the pre-show gets underway at 7:30 p.m. EDT. Ordering the event on PPV costs $54.99, but fans can also watch the event with a live stream on the WWE Network. A subscription to the network costs $9.99 per month, though new subscribers get the first month free.

artificial intelligence, machine learning, wwe battleground 2017, (14 more...)

International Business Times

Country: North America > United States > New York (0.06)

Industry: Leisure & Entertainment > Sports > Martial Arts (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Add feedback

WWE Battleground 2017: Predictions, Match Card For 'SmackDown Live' PPV Before SummerSlam

International Business TimesJul-19-2017, 17:33:57 GMT

The final WWE pay-per-view before SummerSlam is set for Sunday night in Philadelphia with WWE Battleground 2017. Jinder Mahal will defend his WWE Championship against Randy Orton in the main event, and two more championships will be on the line. Below are predictions for every match on the card, which will feature only members of the "SmackDown Live" roster. It was shocking for Jinder to win the WWE Championship shortly after WrestleMania 33, but it would make little sense for him to lose the title at this point. He's already beaten Orton twice, and their feud should come to an end at Battleground.

machine learning, prediction, summerslam, (15 more...)

International Business Times

Country:

North America > United States > New York (0.06)
North America > United States > California > Los Angeles County > Los Angeles (0.06)

Industry: Leisure & Entertainment > Sports > Martial Arts (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Boolean kernels for collaborative filtering in top-N item recommendation

Polato, Mirko, Aiolli, Fabio

arXiv.org Artificial IntelligenceJul-18-2017

In many personalized recommendation problems available data consists only of positive interactions (implicit feedback) between users and items. This problem is also known as One-Class Collaborative Filtering (OC-CF). Linear models usually achieve state-of-the-art performances on OC-CF problems and many efforts have been devoted to build more expressive and complex representations able to improve the recommendations.Recent analysis show that collaborative filtering (CF) datasets have peculiar characteristics such as high sparsity and a long tailed distribution of the ratings. In this paper we propose a boolean kernel, called Disjunctive kernel, which is less expressive than the linear one but it is able to alleviate the sparsity issue in CF contexts. The embedding of this kernel is composed by all the combinations of a certain arity d of the input variables, and these combined features are semantically interpreted as disjunctions of the input variables. Experiments on several CF datasets show the effectiveness and the efficiency of the proposed kernel. Keywords: Boolean kernel, Kernel methods, Recommender systems, Collaborative filtering, Implicit feedback 1. Introduction Collaborative Filtering (CF) is the de facto approach for making personalized recommendation. CF techniques exploit historical information about the useritem interactions in order to improve future recommendations to users. Useritem interactions can be of two types: explicit or implicit.

artificial intelligence, kernel, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1612.07025

Country:

North America > United States (0.93)
North America > Canada > British Columbia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Estimation of Large Covariance and Precision Matrices from Temporally Dependent Observations

Shu, Hai, Nan, Bin

arXiv.org Machine LearningJul-18-2017

We consider the estimation of large covariance and precision matrices from high-dimensional sub-Gaussian or heavier-tailed observations with slowly decaying temporal dependence. The temporal dependence is allowed to be long-range so with longer memory than those considered in the current literature. We show that several commonly used methods for independent observations can be applied to the temporally dependent data. In particular, the rates of convergence are obtained for the generalized thresholding estimation of covariance and correlation matrices, and for the constrained $\ell_1$ minimization and the $\ell_1$ penalized likelihood estimation of precision matrix. Properties of sparsistency and sign-consistency are also established. A gap-block cross-validation method is proposed for the tuning parameter selection, which performs well in simulations. As a motivating example, we study the brain functional connectivity using resting-state fMRI time series data with long-range temporal dependence.

artificial intelligence, estimation, machine learning, (13 more...)

arXiv.org Machine Learning

1412.5059

Country: North America > United States (1.00)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.45)

Add feedback

Data Science Has Been Using Rebel Statistics for a Long Time

@machinelearnbotJul-17-2017, 02:10:06 GMT

Many of those who call themselves statisticians just won't admit that data science heavily relies on and uses (heretical, rule-breaking) statistical science, or they don't recognize the true statistical nature of these data science techniques (some are 15-year old), or are opposed to the modernization of their statistical arsenal. They already missed the train when machine learning became a popular discipline (also heavily based on statistics) more than 15 years ago. Now machine learning professionals, who are statistical practitioners working on problems such as clustering, far outnumber statisticians. Many times, I have interacted with statisticians who think that anyone not calling himself statistician, knows nothing or little about statistics; see my recent bio published here, or visit the LinkedIn profiles of many data scientists, to debunk this myth. Any statistical technique that is not in their old books are considered heretical at best, or non-statistic at worst, or most of the time, not understood.

artificial intelligence, machine learning, social media, (17 more...)

@machinelearnbot

Genre: Research Report > Experimental Study (0.30)

Industry: Information Technology (0.97)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Sparse Probit Linear Mixed Model

Mandt, Stephan, Wenzel, Florian, Nakajima, Shinichi, Cunningham, John P., Lippert, Christoph, Kloft, Marius

arXiv.org Machine LearningJul-17-2017

Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for linear regression, LMMs have been restricted to continuous phenotypes. We introduce the Sparse Probit Linear Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to binary phenotypes. As a technical challenge, the model no longer possesses a closed-form likelihood function. In this paper, we present a scalable approximate inference algorithm that lets us fit the model to high-dimensional data sets. We show on three real-world examples from different domains that in the setup of binary labels, our algorithm leads to better prediction accuracies and also selects features which show less correlation with the confounding factors.

artificial intelligence, machine learning, sparse probit regression, (16 more...)

arXiv.org Machine Learning

doi: 10.1007/s10994-017-5652-6

1507.04777

Country:

Europe > Germany (0.46)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback