Goto

Collaborating Authors

 Accuracy


A Neural Network Model to Classify Liver Cancer Patients Using Data Expansion and Compression

arXiv.org Machine Learning

We develop a neural network model to classify liver cancer patients into high-risk and low-risk groups using genomic data. Our approach provides a novel technique to classify big data sets using neural network models. We preprocess the data before training the neural network models. We first expand the data using wavelet analysis. We then compress the wavelet coefficients by mapping them onto a new scaled orthonormal coordinate system. Then the data is used to train a neural network model that enables us to classify cancer patients into two different classes of high-risk and low-risk patients. We use the leave-one-out approach to build a neural network model. This neural network model enables us to classify a patient using genomic data as a high-risk or low-risk patient without any information about the survival time of the patient. The results from genomic data analysis are compared with survival time analysis. It is shown that the expansion and compression of data using wavelet analysis and singular value decomposition (SVD) is essential to train the neural network model.


Machine Learning Basics with Naive Bayes

#artificialintelligence

After researching and looking into the different algorithms associated with Machine Learning, I've found that there is an abundance of great material showing you how to use certain algorithms in a specific language. However what's usually missing is the simple mathematical explaination of how the algorithm works. In all cases this may not be possible without a strong mathematical background, but for some I know I would definitely find it useful. This post requires just basic mathematics knowledge and an interst in data science and machine learning. I will be talking about Naive Bayes as a classifier and explaining in simple terms how it works and when you might use it.


Naive Bayes Quiz

#artificialintelligence

Udacity 59 views Show Developer Workflow - Duration: 2:09. Udacity 43 views App Versions and Design - Duration: 1:39. Udacity 264 views 25 L Missing Data Factors To Consider 1 - Duration: 2:17.


When an AI machine studied declassified State Department cables, it found secrets that should have been confidential

#artificialintelligence

The U.S. State Department generates some two billion e-mails every year. A significant fraction of these contain sensitive or secret information and so have to be classified, a process that is time-consuming and costly. In 2015 alone, it spent $16 billion to protect classified information. But the reliability of this process of classification is unclear. Nobody knows whether the rules for classifying information are applied consistently and reliably.


Andre Ward vs. Sergey Kovalev: PPV Info, Actual Start Time, Prediction, Betting Odds, For Championship Boxing Event

International Business Times

Andre Ward (30-0, 15 KOs) and Sergey Kovalev (30-0-1, 26 KOs) are among the best pound-for-pound boxers in the world, despite being nowhere close to being household names. The two light heavyweights will battle for Kovalev's WBA, IBF and WBO light heavyweight titles on Saturday night at T-Mobile Arena in Las Vegas in perhaps the most anticipated boxing event of 2016. Boxing purists are likely drooling over this fight because of the obvious similarities between the two superstars. They are the same height, have almost the same reach, have the same number of wins, fight from an orthodox stance, are fearless in the ring, and both fly under the radar despite their immense talent. Indeed, it's been a long time since there was a championship fight that pitted such evenly matched fighters.


Appraisal of Statistical Practices in HRI vis-a-vis the T-Test for Likert Items/Scales

AAAI Conferences

Likert items and scales are often used in human subject studies to measure subjective responses of subjects to the treatment levels. In the field of human-robot interaction (HRI), with few widely accepted quantitative metrics, researchers often rely on Likert items and scales to evaluate their systems. However, there is a debate on what is the best statistical method to evaluate the differences between experimental treatments based on Likert item or scale responses. Likert responses are ordinal and not interval, meaning, the differences between consecutive responses to a Likert item are not equally spaced quantitatively. Hence, parametric tests like t-test, which require interval and normally distributed data, are often claimed to be statistically unsound in evaluating Likert response data. The statistical purist would use non-parametric tests, such as the Mann-Whitney U test, to evaluate the differences in ordinal datasets; however, non-parametric tests sacrifice the sensitivity in detecting differences a more conservative specificity -- or false positive rate. Finally, it is common practice in the field of HRI to sum up similar individual Likert items to form a Likert scale and use the t-test or ANOVA on the scale seeking the refuge of the central limit theorem. In this paper, we empirically evaluate the validity of the t-test vs. the Mann-Whitney U test for Likert items and scales. We conduct our investigation via Monte Carlo simulation to quantify sensitivity and specificity of the tests.


Determining the Veracity of Rumours on Twitter

arXiv.org Machine Learning

While social networks can provide an ideal platform for up-to-date information from individuals across the world, it has also proved to be a place where rumours fester and accidental or deliberate misinformation often emerges. In this article, we aim to support the task of making sense from social media data, and specifically, seek to build an autonomous message-classifier that filters relevant and trustworthy information from Twitter. For our work, we collected about 100 million public tweets, including users' past tweets, from which we identified 72 rumours (41 true, 31 false). We considered over 80 trustworthiness measures including the authors' profile and past behaviour, the social network connections (graphs), and the content of tweets themselves. We ran modern machine-learning classifiers over those measures to produce trustworthiness scores at various time windows from the outbreak of the rumour. Such time-windows were key as they allowed useful insight into the progression of the rumours. From our findings, we identified that our model was significantly more accurate than similar studies in the literature. We also identified critical attributes of the data that give rise to the trustworthiness scores assigned. Finally we developed a software demonstration that provides a visual user interface to allow the user to examine the analysis.


What is a Confusion Matrix in Machine Learning - Machine Learning Mastery

#artificialintelligence

This matrix can be used for 2-class problems where it is very easy to understand, but can easily be applied to problems with 3 or more class values, by adding more rows and columns to the confusion matrix. Let's make this explanation of creating a confusion matrix concrete with an example. Let's pretend we have a two-class classification problem of predicting whether a photograph contains a man or a woman. We have a test dataset of 10 records with expected outcomes and a set of predictions from our classification algorithm. Let's start off and calculate the classification accuracy for this set of predictions. The algorithm made 7 of the 10 predictions correct with an accuracy of 70%. First, we must calculate the number of correct predictions for each class. Now, we can calculate the number of incorrect predictions for each class, organized by the predicted value.


Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example

arXiv.org Machine Learning

Resting-state functional Magnetic Resonance Imaging (RfMRI) holds the promise to reveal functional biomarkers of neuropsychiatric disorders. However, extracting such biomarkers is challenging for complex multifaceted neuropathologies, such as autism spectrum disorders. Large multi-site datasets increase sample sizes to compensate for this complexity, at the cost of uncontrolled heterogeneity. This heterogeneity raises new challenges, akin to those face in realistic diagnostic applications. Here, we demonstrate the feasibility of inter-site classification of neuropsychiatric status, with an application to the Autism Brain Imaging Data Exchange (ABIDE) database, a large (N 871) multi-site autism dataset. For this purpose, we investigate pipelines that extract the most predictive biomarkers from the data. These RfMRI pipelines build participant-specific connectomes from functionally-defined brain areas. Connectomes are then compared across participants to learn patterns of connectivity that differentiate typical controls from individuals with autism. We predict this neuropsychiatric status for participants from the same acquisition sites or different, unseen, ones. Good choices of methods for the various steps of the pipeline lead to 67% prediction accuracy on the full ABIDE data, which is significantly better than previously reported results. We perform extensive validation on multiple subsets of the data defined by different inclusion criteria. These enables detailed analysis of the factors contributing to successful connectome-based prediction. First, prediction accuracy improves as we include more subjects, up to the maximum amount of subjects available. Second, the definition of functional brain areas is of paramount importance for biomarker discovery: brain areas extracted from large RfMRI datasets outperform reference atlases in the classification tasks. Keywords: 1. Introduction data heterogeneity, resting-state fMRI, data pipelines, biomarkers, connectome, autism spectrum disorders In psychiatry, as in other fields of medicine, both the standardized observation of signs, as well as the symptom profile are critical for diagnosis. However, compared to other fields of medicine, psychiatry lacks accompanying objective markers that could lead to more refined diagnoses and targeted treatment [1]. Advances in noninvasive brain imaging techniques and analyses (e. g. [2, 3]) are showing great promise for uncovering patterns of brain structure and function that can be used as objective measures of mental illness.


Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity

arXiv.org Machine Learning

Functional brain networks are well described and estimated from data with Gaussian Graphical Models (GGMs), e.g. using sparse inverse covariance estimators. Comparing functional connectivity of subjects in two populations calls for comparing these estimated GGMs. Our goal is to identify differences in GGMs known to have similar structure. We characterize the uncertainty of differences with confidence intervals obtained using a parametric distribution on parameters of a sparse estimator. Sparse penalties enable statistical guarantees and interpretable models even in high-dimensional and low-sample settings. Characterizing the distributions of sparse models is inherently challenging as the penalties produce a biased estimator. Recent work invokes the sparsity assumptions to effectively remove the bias from a sparse estimator such as the lasso. These distributions can be used to give confidence intervals on edges in GGMs, and by extension their differences. However, in the case of comparing GGMs, these estimators do not make use of any assumed joint structure among the GGMs. Inspired by priors from brain functional connectivity we derive the distribution of parameter differences under a joint penalty when parameters are known to be sparse in the difference. This leads us to introduce the debiased multi-task fused lasso, whose distribution can be characterized in an efficient manner. We then show how the debiased lasso and multi-task fused lasso can be used to obtain confidence intervals on edge differences in GGMs. We validate the techniques proposed on a set of synthetic examples as well as neuro-imaging dataset created for the study of autism.