AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

5 questions to ask about machine learning – Sophos News

#artificialintelligenceJul-26-2017, 04:15:45 GMT

But few of them will get around to explaining what it is, how it works and why you should care. At Sophos, we've made big investments in data science and machine learning, including acquiring machine learning company Invincea and establishing a team of leading data scientists focused on infusing machine learning into the core of our products. Our approach to machine learning has always been based on science, transparency and validation. Instead of describing machine learning like pixie dust to be spread on products, we'll use this forum to describe the nuts, bolts and challenges in machine learning and how we approach it. One of the hardest things when evaluating machine learning products is how to find out what's under the hood and why.

artificial intelligence, detection rate, machine learning, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.90)

Add feedback

Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities

Mukherjee, Subhabrata

arXiv.org Machine LearningJul-26-2017

One of the major hurdles preventing the full exploitation of information from online communities is the widespread concern regarding the quality and credibility of user-contributed content. Prior works in this domain operate on a static snapshot of the community, making strong assumptions about the structure of the data (e.g., relational tables), or consider only shallow features for text classification. To address the above limitations, we propose probabilistic graphical models that can leverage the joint interplay between multiple factors in online communities --- like user interactions, community dynamics, and textual content --- to automatically assess the credibility of user-contributed online content, and the expertise of users and their evolution with user-interpretable explanation. To this end, we devise new models based on Conditional Random Fields for different settings like incorporating partial expert knowledge for semi-supervised learning, and handling discrete labels as well as numeric ratings for fine-grained analysis. This enables applications such as extracting reliable side-effects of drugs from user-contributed posts in healthforums, and identifying credible content in news communities. Online communities are dynamic, as users join and leave, adapt to evolving trends, and mature over time. To capture this dynamics, we propose generative models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian Motion to trace the continuous evolution of user expertise and their language model over time. This allows us to identify expert users and credible content jointly over time, improving state-of-the-art recommender systems by explicitly considering the maturity of users. This also enables applications such as identifying helpful product reviews, and detecting fake and anomalous reviews with limited information.

artificial intelligence, machine learning, natural language, (25 more...)

arXiv.org Machine Learning

1707.08309

Country:

North America > Canada (1.00)
Asia (1.00)
Europe > Germany (0.92)
(4 more...)

Genre:

Summary/Review (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
(2 more...)

Industry:

Media > News (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)
(11 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(9 more...)

Add feedback

A comparison of single-trial EEG classification and EEG-informed fMRI across three MR compatible EEG recording systems

Faller, Josef, Hong, Linbi, Cummings, Jennifer, Sajda, Paul

arXiv.org Machine LearningJul-25-2017

Simultaneously recorded electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) can be used to non-invasively measure the spatiotemporal dynamics of the human brain. One challenge is dealing with the artifacts that each modality introduces into the other when the two are recorded concurrently, for example the ballistocardiogram (BCG). We conducted a preliminary comparison of three different MR compatible EEG recording systems and assessed their performance in terms of single-trial classification of the EEG when simultaneously collecting fMRI. We found tradeoffs across all three systems, for example varied ease of setup and improved classification accuracy with reference electrodes (REF) but not for pulse artifact subtraction (PAS) or reference layer adaptive filtering (RLAF).

compatible eeg recording system, fmri analysis, single-trial eeg classification, (9 more...)

arXiv.org Machine Learning

1707.08077

Country:

North America > United States > Wisconsin > Waukesha County > Waukesha (0.05)
North America > United States > New York > Richmond County > New York City (0.05)
North America > United States > New York > Queens County > New York City (0.05)
(5 more...)

Genre: Research Report > New Finding (0.32)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.36)

Add feedback

Neighborhood Features Help Detecting Non-Technical Losses in Big Data Sets

Glauner, Patrick, Meira, Jorge, Dolberg, Lautaro, State, Radu, Bettinger, Franck, Rangoni, Yves, Duarte, Diogo

arXiv.org Artificial IntelligenceJul-25-2017

Electricity theft is a major problem around the world in both developed and developing countries and may range up to 40% of the total electricity distributed. More generally, electricity theft belongs to non-technical losses (NTL), which are losses that occur during the distribution of electricity in power grids. In this paper, we build features from the neighborhood of customers. We first split the area in which the customers are located into grids of different sizes. For each grid cell we then compute the proportion of inspected customers and the proportion of NTL found among the inspected customers. We then analyze the distributions of features generated and show why they are useful to predict NTL. In addition, we compute features from the consumption time series of customers. We also use master data features of customers, such as their customer class and voltage of their connection. We compute these features for a Big Data base of 31M meter readings, 700K customers and 400K inspection results. We then use these features to train four machine learning algorithms that are particularly suitable for Big Data sets because of their parallelizable structure: logistic regression, k-nearest neighbors, linear support vector machine and random forest. Using the neighborhood features instead of only analyzing the time series has resulted in appreciable results for Big Data sets for varying NTL proportions of 1%-90%. This work can therefore be deployed to a wide range of different regions around the world.

artificial intelligence, machine learning, proportion, (18 more...)

arXiv.org Artificial Intelligence

1607.00872

Country:

Asia (0.93)
Europe > Luxembourg (0.16)

Genre: Research Report > New Finding (0.88)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.70)

Add feedback

The Challenge of Non-Technical Loss Detection using Artificial Intelligence: A Survey

Glauner, Patrick, Meira, Jorge Augusto, Valtchev, Petko, State, Radu, Bettinger, Franck

arXiv.org Artificial IntelligenceJul-25-2017

Detection of non-technical losses (NTL) which include electricity theft, faulty meters or billing errors has attracted increasing attention from researchers in electrical engineering and computer science. NTLs cause significant harm to the economy, as in some countries they may range up to 40% of the total electricity distributed. The predominant research direction is employing artificial intelligence to predict whether a customer causes NTL. This paper first provides an overview of how NTLs are defined and their impact on economies, which include loss of revenue and profit of electricity providers and decrease of the stability and reliability of electrical power grids. It then surveys the state-of-the-art research efforts in a up-to-date and comprehensive review of algorithms, features and data sets used. It finally identifies the key scientific and engineering challenges in NTL detection and suggests how they could be addressed in the future.

customer, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.2991/ijcis.2017.10.1.51

1606.00626

Country:

Asia (0.68)
North America > Canada (0.28)
North America > United States (0.28)

Genre: Overview (1.00)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

Large-Scale Detection of Non-Technical Losses in Imbalanced Data Sets

Glauner, Patrick O., Boechat, Andre, Dolberg, Lautaro, State, Radu, Bettinger, Franck, Rangoni, Yves, Duarte, Diogo

arXiv.org Artificial IntelligenceJul-25-2017

Non-technical losses (NTL) such as electricity theft cause significant harm to our economies, as in some countries they may range up to 40% of the total electricity distributed. Detecting NTLs requires costly on-site inspections. Accurate prediction of NTLs for customers using machine learning is therefore crucial. To date, related research largely ignore that the two classes of regular and non-regular customers are highly imbalanced, that NTL proportions may change and mostly consider small data sets, often not allowing to deploy the results in production. In this paper, we present a comprehensive approach to assess three NTL detection models for different NTL proportions in large real world data sets of 100Ks of customers: Boolean rules, fuzzy logic and Support Vector Machine. This work has resulted in appreciable results that are about to be deployed in a leading industry solution. We believe that the considerations and observations made in this contribution are necessary for future smart meter research in order to report their effectiveness on imbalanced and large real world data sets.

artificial intelligence, machine learning, ntl proportion, (16 more...)

arXiv.org Artificial Intelligence

1602.0835

Country: Asia (0.28)

Genre: Research Report (0.50)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data

Bullough, Benjamin L., Yanchenko, Anna K., Smith, Christopher L., Zipkin, Joseph R.

arXiv.org Machine LearningJul-25-2017

Each year, thousands of software vulnerabilities are discovered and reported to the public. Unpatched known vulnerabilities are a significant security risk. It is imperative that software vendors quickly provide patches once vulnerabilities are known and users quickly install those patches as soon as they are available. However, most vulnerabilities are never actually exploited. Since writing, testing, and installing software patches can involve considerable resources, it would be desirable to prioritize the remediation of vulnerabilities that are likely to be exploited. Several published research studies have reported moderate success in applying machine learning techniques to the task of predicting whether a vulnerability will be exploited. These approaches typically use features derived from vulnerability databases (such as the summary text describing the vulnerability) or social media posts that mention the vulnerability by name. However, these prior studies share multiple methodological shortcomings that inflate predictive power of these approaches. We replicate key portions of the prior work, compare their approaches, and show how selection of training and test data critically affect the estimated performance of predictive models. The results of this study point to important methodological considerations that should be taken into account so that results reflect real-world utility.

data mining, machine learning, vulnerability, (17 more...)

arXiv.org Machine Learning

doi: 10.1145/3041008.3041009

1707.08015

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Accelerating Permutation Testing in Voxel-wise Analysis through Subspace Tracking: A new plugin for SnPM

Gutierrez-Barragan, Felipe, Ithapu, Vamsi K., Hinrichs, Chris, Maumet, Camille, Johnson, Sterling C., Nichols, Thomas E., Singh, Vikas, ADNI, the

arXiv.org Machine LearningJul-24-2017

Permutation testing is a non-parametric method for obtaining the max null distribution used to compute corrected $p$-values that provide strong control of false positives. In neuroimaging, however, the computational burden of running such an algorithm can be significant. We find that by viewing the permutation testing procedure as the construction of a very large permutation testing matrix, $T$, one can exploit structural properties derived from the data and the test statistics to reduce the runtime under certain conditions. In particular, we see that $T$ is low-rank plus a low-variance residual. This makes $T$ a good candidate for low-rank matrix completion, where only a very small number of entries of $T$ ($\sim0.35\%$ of all entries in our experiments) have to be computed to obtain a good estimate. Based on this observation, we present RapidPT, an algorithm that efficiently recovers the max null distribution commonly obtained through regular permutation testing in voxel-wise analysis. We present an extensive validation on a synthetic dataset and four varying sized datasets against two baselines: Statistical NonParametric Mapping (SnPM13) and a standard permutation testing implementation (referred as NaivePT). We find that RapidPT achieves its best runtime performance on medium sized datasets ($50 \leq n \leq 200$), with speedups of 1.5x - 38x (vs. SnPM13) and 20x-1000x (vs. NaivePT). For larger datasets ($n \geq 200$) RapidPT outperforms NaivePT (6x - 200x) on all datasets, and provides large speedups over SnPM13 when more than 10000 permutations (2x - 15x) are needed. The implementation is a standalone toolbox and also integrated within SnPM13, able to leverage multi-core architectures when available.

artificial intelligence, machine learning, rapidpt, (16 more...)

arXiv.org Machine Learning

doi: 10.1016/j.neuroimage.2017.07.025

1703.01506

Country: North America > United States > California (0.28)

Genre:

Research Report > New Finding (0.89)
Research Report > Experimental Study (0.71)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Health Analytics: a systematic review of approaches to detect phenotype cohorts using electronic health records

Hiob, Norman, Lessmann, Stefan

arXiv.org Machine LearningJul-24-2017

The paper presents a systematic review of state-of-the-art approaches to identify patient cohorts using electronic health records. It gives a comprehensive overview of the most commonly de-tected phenotypes and its underlying data sets. Special attention is given to preprocessing of in-put data and the different modeling approaches. The literature review confirms natural language processing to be a promising approach for electronic phenotyping. However, accessibility and lack of natural language process standards for medical texts remain a challenge. Future research should develop such standards and further investigate which machine learning approaches are best suited to which type of medical data.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1707.07425

Country: North America > United States (0.47)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > Promising Solution (0.86)
Research Report > New Finding (0.69)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Iterative Hard Thresholding for Model Selection in Genome-Wide Association Studies

Keys, Kevin L., Chen, Gary K., Lange, Kenneth

arXiv.org Machine LearningJul-24-2017

A genome-wide association study (GWAS) correlates marker variation with trait variation in a sample of individuals. Each study subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here we assume that subjects are unrelated and collected at random and that trait values are normally distributed or transformed to normality. Over the past decade, researchers have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with LASSO or MCP penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. This paper introduces the iterative hard thresholding (IHT) algorithm to the GWAS analysis of continuous traits. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoid supercomputing. We evaluate IHT performance on both simulated and real GWAS data and conclude that it reduces false positive and false negative rates while remaining competitive in computational time with penalized regression. Source code is freely available at https://github.com/klkeys/IHT.jl.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1608.01398

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > San Francisco County > San Francisco (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback