AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Grouping the executables to detect malware with high accuracy

arXiv.org Artificial IntelligenceJun-22-2016

The metamorphic malware variants with the same malicious behavior (family), can obfuscate themselves to look different from each other. This variation in structure leads to a huge signature database for traditional signature matching techniques to detect them. In order to effective and efficient detection of malware in large amounts of executables, we need to partition these files into groups which can identify their respective families. In addition, the grouping criteria should be chosen such a way that, it can also be applied to unknown files encounter on computers for classification. This paper discusses the study of malware and benign executables in groups to detect unknown malware with high accuracy. We studied sizes of malware generated by three popular second generation malware (metamorphic malware) creator kits viz. G2, PS-MPC and NGVCK, and observed that the size variation in any two generated malware from same kit is not much. Hence, we grouped the executables on the basis of malware sizes by using Optimal k-Means Clustering algorithm and used these obtained groups to select promising features for training (Random forest, J48, LMT, FT and NBT) classifiers to detect variants of malware or unknown malware. We find that detection of malware on the basis of their respected file sizes gives accuracy up to 99.11% from the classifiers.

artificial intelligence, machine learning, malware, (16 more...)

arXiv.org Artificial Intelligence

1606.06908

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

AI Boosts Cancer Screens to Nearly 100 Percent Accuracy

#artificialintelligenceJun-21-2016, 20:44:51 GMT

Diagnosing cancer is about to get more accurate, with the help of artificial intelligence. Pathologists have diagnosed diseases in more or less the same way for the past 100 years, by laboring over a microscope reviewing biopsy samples on little glass slides. Working almost robotically, they sift through millions of normal cells to identify just a few diseased ones. The task is tedious and prone to human error. But now, scientists and engineers have created a technique that uses artificial intelligence (AI) and can differentiate cancer cells from normal cells almost as well as a top-notch pathologist.

artificial intelligence, cancer cell, machine learning, (14 more...)

#artificialintelligence

Country:

Europe > Netherlands (0.05)
Europe > Czechia > Prague (0.05)
Asia > Middle East > Israel (0.05)

Genre: Research Report (0.32)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)

Add feedback

Risk-consistency of cross-validation with lasso-type procedures

Homrighausen, Darren, McDonald, Daniel J.

arXiv.org Machine LearningJun-21-2016

The lasso and related sparsity inducing algorithms have been the target of substantial theoretical and applied research. Correspondingly, many results are known about their behavior for a fixed or optimally chosen tuning parameter specified up to unknown constants. In practice, however, this oracle tuning parameter is inaccessible so one must use the data to select one. Common statistical practice is to use a variant of cross-validation for this task. However, little is known about the theoretical properties of the resulting predictions with such data-dependent methods. We consider the high-dimensional setting with random design wherein the number of predictors $p$ grows with the number of observations $n$. Under typical assumptions on the data generating process, similar to those in the literature, we recover oracle rates up to a log factor when choosing the tuning parameter with cross-validation. Under weaker conditions, when the true model is not necessarily linear, we show that the lasso remains risk consistent relative to its linear oracle. We also generalize these results to the group lasso and square-root lasso and investigate the predictive and model selection performance of cross-validation via simulation.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Machine Learning

1308.081

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Add feedback

Entry Point Data – Using Python's Sci-packages to Prepare Data for Machine Learning Tasks and other

#artificialintelligenceJun-17-2016, 20:35:42 GMT

In this short tutorial I want to provide a short overview of some of my favorite Python tools for common procedures as entry points for general pattern classification and machine learning tasks, and various other data analyses. In this section want to recommend a way for installing the required Python-packages packages if you have not done so, yet. Otherwise you can skip this part. Although they can be installed step-by-step "manually", but I highly recommend you to take a look at the Anaconda Python distribution for scientific computing. Anaconda is distributed by Continuum Analytics, but it is completely free and includes more than 195 packages for science and data analysis as of today.

artificial intelligence, dataset, machine learning, (16 more...)

#artificialintelligence

Country:

North America > United States > California > Orange County > Irvine (0.04)
Europe > Italy > Liguria > Genoa (0.04)

Genre: Instructional Material (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

AI creates efficiencies in sanctions checking @Euromoney

#artificialintelligenceJun-17-2016, 15:20:35 GMT

In transaction banking, the focus on technological development has centred on the possibilities of blockchain technology. However, this has overshadowed the arrival of AI into transaction-banking platforms. AI and machine learning are helping to further reduce manual checks and processes. The first target for implementation is sanctions and compliance. As companies become increasingly international, irrespective of size, checking against sanctions has become an essential activity for more than just the MNCs. AI can learn through experience what can pass through the sanctions filter, and what compliance obligations need to be checked.

artificial intelligence, efficiency, machine learning, (16 more...)

#artificialintelligence

Industry: Banking & Finance (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.32)

Add feedback

Huge US facial recognition database flawed: audit

Daily Mail - Science & techJun-16-2016, 17:02:44 GMT

The FBI's facial recognition database has more than 400 million pictures to help its criminal investigations, but lacks adequate safeguards for accuracy and privacy protection, a congressional audit has revealed. Totalling 411.9 million images, privacy campaigners have slammed the'unprecedented number of photographs, most of which are of Americans and foreigners who have committed no crimes.' The huge database - which enables investigators to automatically search images for criminal suspects - 'is far greater than had previously been understood' and raises concerns'about the risk of innocent Americans being inadvertently swept up in criminal investigations,' said Senator Al Franken, who requested the study. The FBI's facial recognition database includes some 30 million criminal mugshots and 140 million images from visa applications by foreign nationals The FBI's database includes some 30 million criminal mugshots and 140 million images from visa applications by foreign nationals, the GAO found. It also contains drivers' license pictures from 16 US states and 6.7 million photos from the Defense Department's biometric identification system of individuals detained by US forces abroad, among others.

artificial intelligence, database, machine learning, (14 more...)

Daily Mail - Science & tech

Country:

North America > United States > Virginia (0.05)
North America > United States > Texas (0.05)
North America > United States > New York (0.05)
(8 more...)

Genre: Research Report (0.52)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

Cross-validation in R: a do-it-yourself and a black box approach

@machinelearnbotJun-16-2016, 11:41:05 GMT

In my previous post, we saw that R-squared can lead to a misleading interpretation of the quality of our regression fit, in terms of prediction power. One thing that R-squared offers no protection against is overfitting. On the other hand, cross validation, by allowing us to have cases in our testing set that are different from the cases in our training set, inherently offers protection against overfittting. In this type of validation, one case in our data set is used as the test set, while the remaining cases are used as the training set. We iterate through the data set, until all cases have served as the test set.

artificial intelligence, machine learning, validation, (10 more...)

@machinelearnbot

Industry: Transportation > Air (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.69)

Add feedback

ACDC: $\alpha$-Carving Decision Chain for Risk Stratification

Park, Yubin, Ho, Joyce, Ghosh, Joydeep

arXiv.org Machine LearningJun-16-2016

In many healthcare settings, intuitive decision rules for risk stratification can help effective hospital resource allocation. This paper introduces a novel variant of decision tree algorithms that produces a chain of decisions, not a general tree. Our algorithm, $\alpha$-Carving Decision Chain (ACDC), sequentially carves out "pure" subsets of the majority class examples. The resulting chain of decision rules yields a pure subset of the minority class examples. Our approach is particularly effective in exploring large and class-imbalanced health datasets. Moreover, ACDC provides an interactive interpretation in conjunction with visual performance metrics such as Receiver Operating Characteristics curve and Lift chart.

acdc, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1606.05325

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.50)
Health & Medicine > Therapeutic Area > Nephrology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.55)

Add feedback

No penalty no tears: Least squares in high-dimensional linear models

Wang, Xiangyu, Dunson, David, Leng, Chenlei

arXiv.org Machine LearningJun-16-2016

Ordinary least squares (OLS) is the default method for fitting linear models, but is not applicable for problems with dimensionality larger than the sample size. For these problems, we advocate the use of a generalized version of OLS motivated by ridge regression, and propose two novel three-step algorithms involving least squares fitting and hard thresholding. The algorithms are methodologically simple to understand intuitively, computationally easy to implement efficiently, and theoretically appealing for choosing models consistently. Numerical exercises comparing our methods with penalization-based approaches in simulations and data analyses illustrate the great potential of the proposed algorithms.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1506.02222

Genre: Research Report (0.64)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

Invincea First Machine Learning Based Endpoint Security Company to Join Anti-Malware Testing Standards Organization (AMTSO(TM))

#artificialintelligenceJun-15-2016, 20:36:06 GMT

FAIRFAX, VA--(Marketwired - June 15, 2016) - Invincea, the leader in advanced endpoint threat protection, announced today that it is the first machine learning based endpoint security company to join the Anti-Malware Testing Standards Organization (AMTSO). Participation in AMTSO furthers Invincea's mission of addressing the global need for improvement in third party testing based on scientific objectivity, quality, and relevance of anti-malware testing methodologies. Hundreds of millions of new pieces of malware are created a year, wreaking havoc on enterprises across industries against the backdrop of obsolete anti-malware approaches. To combat the scourge of malware that evades traditional anti-malware systems, the next-gen endpoint security market has exploded with new companies bringing products to market with fantastic claims. To date, these companies have not been held accountable to their marketing claims by independent scientifically valid testing on the merits of their product technology and approaches.

artificial intelligence, machine learning, press release, (13 more...)

#artificialintelligence

Country: North America > United States > Virginia > Fairfax County > Fairfax (0.27)

Genre: Press Release (0.91)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)

Add feedback