AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Kernel Two-Sample Hypothesis Testing Using Kernel Set Classification

arXiv.org Machine LearningNov-13-2017

The two-sample hypothesis testing problem is studied for the challenging scenario of high dimensional data sets with small sample sizes. We show that the two-sample hypothesis testing problem can be posed as a one-class set classification problem. In the set classification problem the goal is to classify a set of data points that are assumed to have a common class. We prove that the average probability of error given a set is less than or equal to the Bayes error and decreases as a power of $n$ number of sample data points in the set. We use the positive definite Set Kernel for directly mapping sets of data to an associated Reproducing Kernel Hilbert Space, without the need to learn a probability distribution. We specifically solve the two-sample hypothesis testing problem using a one-class SVM in conjunction with the proposed Set Kernel. We compare the proposed method with the Maximum Mean Discrepancy, F-Test and T-Test methods on a number of challenging simulated high dimensional and small sample size data. We also perform two-sample hypothesis testing experiments on six cancer gene expression data sets and achieve zero type-I and type-II error results on all data sets.

artificial intelligence, dimension, machine learning, (17 more...)

arXiv.org Machine Learning

1706.05612

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > Experimental Study (0.76)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Discovering Potential Correlations via Hypercontractivity

Kim, Hyeji, Gao, Weihao, Kannan, Sreeram, Oh, Sewoong, Viswanath, Pramod

arXiv.org Machine LearningNov-13-2017

Discovering a correlation from one variable to another variable is of fundamental scientific and practical interest. While existing correlation measures are suitable for discovering average correlation, they fail to discover hidden or potential correlations. To bridge this gap, (i) we postulate a set of natural axioms that we expect a measure of potential correlation to satisfy; (ii) we show that the rate of information bottleneck, i.e., the hypercontractivity coefficient, satisfies all the proposed axioms; (iii) we provide a novel estimator to estimate the hypercontractivity coefficient from samples; and (iv) we provide numerical experiments demonstrating that this proposed estimator discovers potential correlations among various indicators of WHO datasets, is robust in discovering gene interactions from gene expression time series data, and is statistically more powerful than the estimators for other correlation measures in binary hypothesis testing of canonical examples of potential correlations.

artificial intelligence, correlation, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.3390/e19110586

1709.04024

Country:

Europe (0.93)
North America > United States (0.46)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

A Sparse Graph-Structured Lasso Mixed Model for Genetic Association with Confounding Correction

Ye, Wenting, Liu, Xiang, Wang, Haohan, Xing, Eric P.

arXiv.org Machine LearningNov-11-2017

While linear mixed model (LMM) has shown a competitive performance in correcting spurious associations raised by population stratification, family structures, and cryptic relatedness, more challenges are still to be addressed regarding the complex structure of genotypic and phenotypic data. For example, geneticists have discovered that some clusters of phenotypes are more co-expressed than others. Hence, a joint analysis that can utilize such relatedness information in a heterogeneous data set is crucial for genetic modeling. We proposed the sparse graph-structured linear mixed model (sGLMM) that can incorporate the relatedness information from traits in a dataset with confounding correction. Our method is capable of uncovering the genetic associations of a large number of phenotypes together while considering the relatedness of these phenotypes. Through extensive simulation experiments, we show that the proposed model outperforms other existing approaches and can model correlation from both population structure and shared signals. Further, we validate the effectiveness of sGLMM in the real-world genomic dataset on two different species from plants and humans. In Arabidopsis thaliana data, sGLMM behaves better than all other baseline models for 63.4% traits. We also discuss the potential causal genetic variation of Human Alzheimer's disease discovered by our model and justify some of the most important genetic loci.

artificial intelligence, machine learning, mixed model, (19 more...)

arXiv.org Machine Learning

1711.04162

Country: North America > United States (0.68)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)

Add feedback

Learning the PE Header, Malware Detection with Minimal Domain Knowledge

Raff, Edward, Sylvester, Jared, Nicholas, Charles

arXiv.org Machine LearningNov-11-2017

Many efforts have been made to use various forms of domain knowledge in malware detection. Currently there exist two common approaches to malware detection without domain knowledge, namely byte n-grams and strings. In this work we explore the feasibility of applying neural networks to malware detection and feature learning. We do this by restricting ourselves to a minimal amount of domain knowledge in order to extract a portion of the Portable Executable (PE) header. By doing this we show that neural networks can learn from raw bytes without explicit feature construction, and perform even better than a domain knowledge approach that parses the PE header into explicit features.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Machine Learning

doi: 10.1145/3128572.3140442

1709.01471

Country: North America > United States > Maryland (0.67)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

Related Datasets in Oracle DV Machine Learning models

#artificialintelligenceNov-10-2017, 15:45:43 GMT

Depending on the algorithm/model that generates this dataset metrics present in the dataset will vary. Here is a list of metrics based on the model: Linear Regression, CART numeric, Elastic Net Linear: R-Square, R-Square Adjusted, Mean Absolute Error(MAE), Mean Squared Error(MSE), Relative Absolute Error(RAE), Related Squared Error(RSE), Root Mean Squared Error(RMSE) CART(Classification And Regression Trees), Naive Bayes Classification, Neural Network, Support Vector Machine(SVM), Random Forest, Logistic Regression: Now you know what the Related datasets are and how they can be useful for fine tuning your Machine Learning model or for comparing two different models. .

artificial intelligence, decision tree learning, machine learning, (19 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.81)

Add feedback

Dynamic Analysis of Executables to Detect and Characterize Malware

Smith, Michael R., Ingram, Joe B., Lamb, Christopher C., Draelos, Timothy J., Doak, Justin E., Aimone, James B., James, Conrad D.

arXiv.org Machine LearningNov-10-2017

It is needed to ensure the integrity of systems that process sensitive information and control many aspects of everyday life. We examine the use of machine learning algorithms to detect malware using the system calls generated by executables-alleviating attempts at obfuscation as the behavior is monitored rather than the bytes of an executable. We examine several machine learning techniques for detecting malware including random forests, deep learning techniques, and liquid state machines. The experiments examine the effects of concept drift on each algorithm to understand how well the algorithms generalize to novel malware samples by testing them on data that was collected after the training data. The results suggest that each of the examined machine learning algorithms is a viable solution to detect malware-achieving between 90% and 95% class-averaged accuracy (CAA). In real-world scenarios, the performance evaluation on an operational network may not match the performance achieved in training. Namely, the CAA may be about the same, but the values for precision and recall over the malware can change significantly. We structure experiments to highlight these caveats and offer insights into expected performance in operational environments. In addition, we use the induced models to gain a better understanding about what differentiates the malware samples from the goodware, which can further be used as a forensics tool to understand what the malware (or goodware) was doing to provide directions for investigation and remediation.

artificial intelligence, machine learning, malware, (17 more...)

arXiv.org Machine Learning

1711.03947

Country: North America > United States > New York (0.14)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

7 Important Model Evaluation Error Metrics Everyone should know

#artificialintelligenceNov-9-2017, 12:20:50 GMT

Predictive Modeling works on constructive feedback principle. Get feedback from metrics, make improvements and continue until you achieve a desirable accuracy. Evaluation metrics explain the performance of a model. An important aspects of evaluation metrics is their capability to discriminate among model results. Once they are finished building a model, they hurriedly map predicted values on unseen data. This is an incorrect approach. Simply, building a predictive model is not your motive. But, creating and selecting a model which gives high accuracy on out of sample data.

artificial intelligence, machine learning, validation, (18 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Data Science (0.90)

Add feedback

Breast density classification with deep convolutional neural networks

Wu, Nan, Geras, Krzysztof J., Shen, Yiqiu, Su, Jingyi, Kim, S. Gene, Kim, Eric, Wolfson, Stacey, Moy, Linda, Cho, Kyunghyun

arXiv.org Machine LearningNov-9-2017

Breast density classification is an essential part of breast cancer screening. Although a lot of prior work considered this problem as a task for learning algorithms, to our knowledge, all of them used small and not clinically realistic data both for training and evaluation of their models. In this work, we explore the limits of this task with a data set coming from over 200,000 breast cancer screening exams. We use this data to train and evaluate a strong convolutional neural network classifier. In a reader study, we find that our model can perform this task comparably to a human expert.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

1711.03674

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fast Meta-Learning for Adaptive Hierarchical Classifier Design

Burg, Gerrit J. J. van den, Hero, Alfred O.

arXiv.org Machine LearningNov-9-2017

The Bayes error rate (BER) is a central concept in the statistical theory of classification. It represents the error rate of the Bayes classifier, which assigns a label to an object corresponding to the class with the highest posterior probability. By definition, the Bayes error represents the smallest possible average error rate that can be achieved by any decision rule (Wald, 1947). Because of these properties, the BER is of great interest both for benchmarking classification algorithms as well as for the practical design of classification algorithms. For example, an accurate approximation of the BER can be used for classifier parameter selection, data dimensionality reduction, or variable selection. However, accurate BER approximation is difficult, especially in high dimension, and thus much attention has focused on tight and tractable BER bounds. This paper proposes a model-free approach to designing multiclass classifiers using a bias-corrected BER bound estimated directly from the multiclass data. There exists several useful bounds on the BER that are functions of the class-dependent feature distributions. These include information theoretic divergence measures such as the Chernoffα -divergence (Chernoff, 1952), the Bhattacharyya divergence (Kailath, 1967), or the Jensen-Shannon divergence (Lin, 1991).

artificial intelligence, classification problem, machine learning, (16 more...)

arXiv.org Machine Learning

1711.03512

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

20 Questions to Detect Fake Data Scientists

@machinelearnbotNov-8-2017, 00:26:15 GMT

Check the answers from KDnuggets Editors to these questions (and one more): 21 Must-Know Data Science Interview Questions and Answers Now that the Data Scientist is officially the sexiest job of the 21st century, everyone wants a piece of the pie. That means there are a few data posers out there. People who call themselves Data Scientists, but who don't actually have the right skill set. This isn't always done out of a desire to deceive. The newness of data science and lack of a widely understood job description means that many people may think they are data scientists purely because they deal with data.

artificial intelligence, data mining, machine learning, (15 more...)

@machinelearnbot

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)

Technology:

Information Technology > Data Science > Data Mining (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

Add feedback