Accuracy
When Even Genome Sequencing Doesn't Give a Diagnosis
Four-year-old Beckett Edwards has had the works when it comes to genetic testing. And his family still doesn't have an answer. Soon after he was born, his parents, Eric and Tricia, noticed that his muscles were floppy. By age two and a half, Beckett had begun losing his 40- to 50-word vocabulary. Now he's able to say only a handful of words and mostly babbles. Like any parents whose child isn't well, Eric and Tricia, who live in Los Angeles, want a diagnosis.
Predicting Cognitive Decline with Deep Learning of Brain Metabolism and Amyloid Imaging
Choi, Hongyoon, Jin, Kyong Hwan
For effective treatment of Alzheimer disease (AD), it is important to identify subjects who are most likely to exhibit rapid cognitive decline. Herein, we developed a novel framework based on a deep convolutional neural network which can predict future cognitive decline in mild cognitive impairment (MCI) patients using flurodeoxyglucose and florbetapir positron emission tomography (PET). The architecture of the network only relies on baseline PET studies of AD and normal subjects as the training dataset. Feature extraction and complicated image preprocessing including nonlinear warping are unnecessary for our approach. Accuracy of prediction (84.2%) for conversion to AD in MCI patients outperformed conventional feature-based quantification approaches. ROC analyses revealed that performance of CNN-based approach was significantly higher than that of the conventional quantification methods (p < 0.05). Output scores of the network were strongly correlated with the longitudinal change in cognitive measurements. These results show the feasibility of deep learning as a tool for predicting disease outcome using brain images.
Integrating Additional Knowledge Into Estimation of Graphical Models
In applications of graphical models, we typically have more information than just the samples themselves. A prime example is the estimation of brain connectivity networks based on fMRI data, where in addition to the samples themselves, the spatial positions of the measurements are readily available. With particular regard for this application, we are thus interested in ways to incorporate additional knowledge most effectively into graph estimation. Our approach to this is to make neighborhood selection receptive to additional knowledge by strengthening the role of the tuning parameters. We demonstrate that this concept (i) can improve reproducibility, (ii) is computationally convenient and efficient, and (iii) carries a lucid Bayesian interpretation. We specifically show that the approach provides effective estimations of brain connectivity graphs from fMRI data. However, providing a general scheme for the inclusion of additional knowledge, our concept is expected to have applications in a wide range of domains.
40 Interview Questions asked at Startups in Machine Learning / Data Science
This article was posted by Manish Saraswat on Analytics Vidhya. Manish who works in marketing and Data Science at Analytics Vidhya believes that education can change this world. R, Data Science and Machine Learning keep him busy. Machine learning and data science are being looked as the drivers of the next industrial revolution happening in the world today. This also means that there are numerous exciting startups looking for data scientists.
When PR and reality collide: The truth about machine learning in cybersecurity
Machine learning (ML) is routinely cited by post-truth vendors as their biggest selling point, their main advantage. But ML – if it's done properly – comes with problems and limitations. ESET has spent years perfecting automated detections, our name for ML in the cybersecurity context. Here are some of the biggest challenges we have observed and overcome in the course of implementing this technology in our business and home solutions. First, to use machine learning you need a lot of inputs, every one of which must be correctly labeled.
Semi-supervised classification for dynamic Android malware detection
Chen, Li, Zhang, Mingwei, Yang, Chih-Yuan, Sahita, Ravi
A growing number of threats to Android phones creates challenges for malware detection. Manually labeling the samples into benign or different malicious families requires tremendous human efforts, while it is comparably easy and cheap to obtain a large amount of unlabeled APKs from various sources. Moreover, the fast-paced evolution of Android malware continuously generates derivative malware families. These families often contain new signatures, which can escape detection when using static analysis. These practical challenges can also cause traditional supervised machine learning algorithms to degrade in performance. In this paper, we propose a framework that uses model-based semi-supervised (MBSS) classification scheme on the dynamic Android API call logs. The semi-supervised approach efficiently uses the labeled and unlabeled APKs to estimate a finite mixture model of Gaussian distributions via conditional expectation-maximization and efficiently detects malwares during out-of-sample testing. We compare MBSS with the popular malware detection classifiers such as support vector machine (SVM), $k$-nearest neighbor (kNN) and linear discriminant analysis (LDA). Under the ideal classification setting, MBSS has competitive performance with 98\% accuracy and very low false positive rate for in-sample classification. For out-of-sample testing, the out-of-sample test data exhibit similar behavior of retrieving phone information and sending to the network, compared with in-sample training set. When this similarity is strong, MBSS and SVM with linear kernel maintain 90\% detection rate while $k$NN and LDA suffer great performance degradation. When this similarity is slightly weaker, all classifiers degrade in performance, but MBSS still performs significantly better than other classifiers.
Large-Scale Online Semantic Indexing of Biomedical Articles via an Ensemble of Multi-Label Classification Models
Papanikolaou, Yannis, Tsoumakas, Grigorios, Laliotis, Manos, Markantonatos, Nikos, Vlahavas, Ioannis
Background: In this paper we present the approaches and methods employed in order to deal with a large scale multi-label semantic indexing task of biomedical papers. This work was mainly implemented within the context of the BioASQ challenge of 2014. Methods: The main contribution of this work is a multi-label ensemble method that incorporates a McNemar statistical significance test in order to validate the combination of the constituent machine learning algorithms. Some secondary contributions include a study on the temporal aspects of the BioASQ corpus (observations apply also to the BioASQ's super-set, the PubMed articles collection) and the proper adaptation of the algorithms used to deal with this challenging classification task. Results: The ensemble method we developed is compared to other approaches in experimental scenarios with subsets of the BioASQ corpus giving positive results. During the BioASQ 2014 challenge we obtained the first place during the first batch and the third in the two following batches. Our success in the BioASQ challenge proved that a fully automated machine-learning approach, which does not implement any heuristics and rule-based approaches, can be highly competitive and outperform other approaches in similar challenging contexts.
6 AI Cybersecurity Startups Keeping You Safe - Nanalyze
The war between machines likely won't be fought across some bomb-blasted hell-scape, with humans scuttling about like roaches trying to avoid being squashed. Rather, machines will fight it out over fiber optic connections, with the battleground being computer servers and laptops containing valuable information. You'll recall that monochromatic pant suits weren't Hilary Clinton's only problem: Russia (or some obese, Big Gulp-slurping teen in his mom's basement) hacked her private emails. Cybersecurity is still the domain of humans, but the job is increasingly being turned over to predictive systems that use various forms of artificial intelligence that do everything from protecting financial information to detecting fraudulent behavior. It's no secret that cybersecurity is big business.
Machine Learning Finds "Fake News" with 88% Accuracy
Since the 2016 presidential election, one topic dominating political discourse is the issue of "Fake News". A number of political pundits claim that the rise of significantly biased and/or untrue news influenced the election, though a study by researchers from Stanford and New York University concluded otherwise. Nonetheless, fake news posts have exploited Facebook users' feeds to propagate throughout the internet. Obviously, a deliberately misleading story is "fake news" but lately blathering social media discourse, is changing its definition. Some now use the term to dismiss facts counter to their preferred viewpoints, the most prominent example being President Trump.
Boosting with Structural Sparsity: A Differential Inclusion Approach
Huang, Chendi, Sun, Xinwei, Xiong, Jiechao, Yao, Yuan
Boosting as gradient descent algorithms is one popular method in machine learning. In this paper a novel Boosting-type algorithm is proposed based on restricted gradient descent with structural sparsity control whose underlying dynamics are governed by differential inclusions. In particular, we present an iterative regularization path with structural sparsity where the parameter is sparse under some linear transforms, based on variable splitting and the Linearized Bregman Iteration. Hence it is called \emph{Split LBI}. Despite its simplicity, Split LBI outperforms the popular generalized Lasso in both theory and experiments. A theory of path consistency is presented that equipped with a proper early stopping, Split LBI may achieve model selection consistency under a family of Irrepresentable Conditions which can be weaker than the necessary and sufficient condition for generalized Lasso. Furthermore, some $\ell_2$ error bounds are also given at the minimax optimal rates. The utility and benefit of the algorithm are illustrated by several applications including image denoising, partial order ranking of sport teams, and world university grouping with crowdsourced ranking data.