AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

Discovering Potential Correlations via Hypercontractivity

Kim, Hyeji, Gao, Weihao, Kannan, Sreeram, Oh, Sewoong, Viswanath, Pramod

arXiv.org Machine LearningNov-13-2017

Discovering a correlation from one variable to another variable is of fundamental scientific and practical interest. While existing correlation measures are suitable for discovering average correlation, they fail to discover hidden or potential correlations. To bridge this gap, (i) we postulate a set of natural axioms that we expect a measure of potential correlation to satisfy; (ii) we show that the rate of information bottleneck, i.e., the hypercontractivity coefficient, satisfies all the proposed axioms; (iii) we provide a novel estimator to estimate the hypercontractivity coefficient from samples; and (iv) we provide numerical experiments demonstrating that this proposed estimator discovers potential correlations among various indicators of WHO datasets, is robust in discovering gene interactions from gene expression time series data, and is statistically more powerful than the estimators for other correlation measures in binary hypothesis testing of canonical examples of potential correlations.

artificial intelligence, correlation, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.3390/e19110586

1709.04024

Country:

Europe (0.93)
North America > United States (0.46)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

A Sparse Graph-Structured Lasso Mixed Model for Genetic Association with Confounding Correction

Ye, Wenting, Liu, Xiang, Wang, Haohan, Xing, Eric P.

arXiv.org Machine LearningNov-11-2017

While linear mixed model (LMM) has shown a competitive performance in correcting spurious associations raised by population stratification, family structures, and cryptic relatedness, more challenges are still to be addressed regarding the complex structure of genotypic and phenotypic data. For example, geneticists have discovered that some clusters of phenotypes are more co-expressed than others. Hence, a joint analysis that can utilize such relatedness information in a heterogeneous data set is crucial for genetic modeling. We proposed the sparse graph-structured linear mixed model (sGLMM) that can incorporate the relatedness information from traits in a dataset with confounding correction. Our method is capable of uncovering the genetic associations of a large number of phenotypes together while considering the relatedness of these phenotypes. Through extensive simulation experiments, we show that the proposed model outperforms other existing approaches and can model correlation from both population structure and shared signals. Further, we validate the effectiveness of sGLMM in the real-world genomic dataset on two different species from plants and humans. In Arabidopsis thaliana data, sGLMM behaves better than all other baseline models for 63.4% traits. We also discuss the potential causal genetic variation of Human Alzheimer's disease discovered by our model and justify some of the most important genetic loci.

artificial intelligence, machine learning, mixed model, (19 more...)

arXiv.org Machine Learning

1711.04162

Country: North America > United States (0.68)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)

Add feedback

Learning the PE Header, Malware Detection with Minimal Domain Knowledge

Raff, Edward, Sylvester, Jared, Nicholas, Charles

arXiv.org Machine LearningNov-11-2017

Many efforts have been made to use various forms of domain knowledge in malware detection. Currently there exist two common approaches to malware detection without domain knowledge, namely byte n-grams and strings. In this work we explore the feasibility of applying neural networks to malware detection and feature learning. We do this by restricting ourselves to a minimal amount of domain knowledge in order to extract a portion of the Portable Executable (PE) header. By doing this we show that neural networks can learn from raw bytes without explicit feature construction, and perform even better than a domain knowledge approach that parses the PE header into explicit features.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Machine Learning

doi: 10.1145/3128572.3140442

1709.01471

Country: North America > United States > Maryland (0.67)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

Related Datasets in Oracle DV Machine Learning models

#artificialintelligenceNov-10-2017, 15:45:43 GMT

Depending on the algorithm/model that generates this dataset metrics present in the dataset will vary. Here is a list of metrics based on the model: Linear Regression, CART numeric, Elastic Net Linear: R-Square, R-Square Adjusted, Mean Absolute Error(MAE), Mean Squared Error(MSE), Relative Absolute Error(RAE), Related Squared Error(RSE), Root Mean Squared Error(RMSE) CART(Classification And Regression Trees), Naive Bayes Classification, Neural Network, Support Vector Machine(SVM), Random Forest, Logistic Regression: Now you know what the Related datasets are and how they can be useful for fine tuning your Machine Learning model or for comparing two different models. .

Add feedback

7 Important Model Evaluation Error Metrics Everyone should know

#artificialintelligenceNov-9-2017, 12:20:50 GMT

Predictive Modeling works on constructive feedback principle. Get feedback from metrics, make improvements and continue until you achieve a desirable accuracy. Evaluation metrics explain the performance of a model. An important aspects of evaluation metrics is their capability to discriminate among model results. Once they are finished building a model, they hurriedly map predicted values on unseen data. This is an incorrect approach. Simply, building a predictive model is not your motive. But, creating and selecting a model which gives high accuracy on out of sample data.

artificial intelligence, machine learning, validation, (18 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Data Science (0.90)

Add feedback

Breast density classification with deep convolutional neural networks

Wu, Nan, Geras, Krzysztof J., Shen, Yiqiu, Su, Jingyi, Kim, S. Gene, Kim, Eric, Wolfson, Stacey, Moy, Linda, Cho, Kyunghyun

arXiv.org Machine LearningNov-9-2017

Breast density classification is an essential part of breast cancer screening. Although a lot of prior work considered this problem as a task for learning algorithms, to our knowledge, all of them used small and not clinically realistic data both for training and evaluation of their models. In this work, we explore the limits of this task with a data set coming from over 200,000 breast cancer screening exams. We use this data to train and evaluate a strong convolutional neural network classifier. In a reader study, we find that our model can perform this task comparably to a human expert.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

1711.03674

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fast Meta-Learning for Adaptive Hierarchical Classifier Design

Burg, Gerrit J. J. van den, Hero, Alfred O.

arXiv.org Machine LearningNov-9-2017

The Bayes error rate (BER) is a central concept in the statistical theory of classification. It represents the error rate of the Bayes classifier, which assigns a label to an object corresponding to the class with the highest posterior probability. By definition, the Bayes error represents the smallest possible average error rate that can be achieved by any decision rule (Wald, 1947). Because of these properties, the BER is of great interest both for benchmarking classification algorithms as well as for the practical design of classification algorithms. For example, an accurate approximation of the BER can be used for classifier parameter selection, data dimensionality reduction, or variable selection. However, accurate BER approximation is difficult, especially in high dimension, and thus much attention has focused on tight and tractable BER bounds. This paper proposes a model-free approach to designing multiclass classifiers using a bias-corrected BER bound estimated directly from the multiclass data. There exists several useful bounds on the BER that are functions of the class-dependent feature distributions. These include information theoretic divergence measures such as the Chernoffα -divergence (Chernoff, 1952), the Bhattacharyya divergence (Kailath, 1967), or the Jensen-Shannon divergence (Lin, 1991).

artificial intelligence, classification problem, machine learning, (16 more...)

arXiv.org Machine Learning

1711.03512

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

20 Questions to Detect Fake Data Scientists

@machinelearnbotNov-8-2017, 00:26:15 GMT

Check the answers from KDnuggets Editors to these questions (and one more): 21 Must-Know Data Science Interview Questions and Answers Now that the Data Scientist is officially the sexiest job of the 21st century, everyone wants a piece of the pie. That means there are a few data posers out there. People who call themselves Data Scientists, but who don't actually have the right skill set. This isn't always done out of a desire to deceive. The newness of data science and lack of a widely understood job description means that many people may think they are data scientists purely because they deal with data.

artificial intelligence, data mining, machine learning, (15 more...)

@machinelearnbot

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)

Technology:

Information Technology > Data Science > Data Mining (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

Add feedback

Efficient Multiple Incremental Computation for Kernel Ridge Regression with Bayesian Uncertainty Modeling

Chen, Bo-Wei, Abdullah, Nik Nailah Binti, Park, Sangoh

arXiv.org Machine LearningNov-8-2017

This study presents an efficient incremental/decremental approach for big streams based on Kernel Ridge Regression (KRR), a frequently used data analysis in cloud centers. To avoid reanalyzing the whole dataset whenever sensors receive new training data, typical incremental KRR used a single-instance mechanism for updating an existing system. However, this inevitably increased redundant computational time, not to mention applicability to big streams. To this end, the proposed mechanism supports incremental/decremental processing for both single and multiple samples (i.e., batch processing). A large scale of data can be divided into batches, processed by a machine, without sacrificing the accuracy. Moreover, incremental/decremental analyses in empirical and intrinsic space are also proposed in this study to handle different types of data either with a large number of samples or high feature dimensions, whereas typical methods focused only on one type. At the end of this study, we further the proposed mechanism to statistical Kernelized Bayesian Regression, so that uncertainty modeling with incremental/decremental computation becomes applicable. Experimental results showed that computational time was significantly reduced, better than the original nonincremental design and the typical single incremental method. Furthermore, the accuracy of the proposed method remained the same as the baselines. This implied that the system enhanced efficiency without sacrificing the accuracy. These findings proved that the proposed method was appropriate for variable streaming data analysis, thereby demonstrating the effectiveness of the proposed method.

artificial intelligence, bayesian inference, machine learning, (13 more...)

arXiv.org Machine Learning

doi: 10.1016/j.future.2017.08.053

1608.00621

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

Agrawal, Amritanshu, Fu, Wei, Menzies, Tim

arXiv.org Artificial IntelligenceNov-7-2017

Context: Topic modeling finds human-readable structures in unstructured textual data. A widely used topic modeler is Latent Dirichlet allocation. When run on different datasets, LDA suffers from "order effects" i.e. different topics are generated if the order of training data is shuffled. Such order effects introduce a systematic error for any study. This error can relate to misleading results;specifically, inaccurate topic descriptions and a reduction in the efficacy of text mining classification results. Objective: To provide a method in which distributions generated by LDA are more stable and can be used for further analysis. Method: We use LDADE, a search-based software engineering tool that tunes LDA's parameters using DE (Differential Evolution). LDADE is evaluated on data from a programmer information exchange site (Stackoverflow), title and abstract text of thousands ofSoftware Engineering (SE) papers, and software defect reports from NASA. Results were collected across different implementations of LDA (Python+Scikit-Learn, Scala+Spark); across different platforms (Linux, Macintosh) and for different kinds of LDAs (VEM,or using Gibbs sampling). Results were scored via topic stability and text mining classification accuracy. Results: In all treatments: (i) standard LDA exhibits very large topic instability; (ii) LDADE's tunings dramatically reduce cluster instability; (iii) LDADE also leads to improved performances for supervised as well as unsupervised learning. Conclusion: Due to topic instability, using standard LDA with its "off-the-shelf" settings should now be depreciated. Also, in future, we should require SE papers that use LDA to test and (if needed) mitigate LDA topic instability. Finally, LDADE is a candidate technology for effectively and efficiently reducing that instability.

data mining, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1608.08176

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback