Performance Analysis
Microsoft's newest milestone? World's lowest error rate in speech recognition ZDNet
The techniques Microsoft Research used to achieve a new world-best error rate will eventually enhance the Cortana Windows 10 personal assistant. Microsoft claims to have achieved the world's lowest error rate for speech recognition, as the company jostles with Amazon, Apple, Google, and IBM to develop products that understand speech as well as humans can. According to Microsoft, its speech scientists at Microsoft Research have achieved a word error rate (WER) of just 6.3 percent under an industry-standard evaluation, using techniques that will eventually enhance Cortana. The previous lowest error rate was 6.9 percent, achieved by IBM's Watson team, which beat their own record of eight percent set last year. Both Microsoft and IBM presented papers detailing their work on speech recognition at the Interspeech conference in San Francisco this week, where papers were also presented by Google's speech researchers.
False Discoveries Occur Early on the Lasso Path
Su, Weijie, Bogdan, Malgorzata, Candes, Emmanuel
In regression settings where explanatory variables have very low correlations and there are relatively few effects, each of large magnitude, we expect the Lasso to find the important variables with few errors, if any. This paper shows that in a regime of linear sparsity---meaning that the fraction of variables with a non-vanishing effect tends to a constant, however small---this cannot really be the case, even when the design variables are stochastically independent. We demonstrate that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are. We derive a sharp asymptotic trade-off between false and true positive rates or, equivalently, between measures of type I and type II errors along the Lasso path. This trade-off states that if we ever want to achieve a type II error (false negative rate) under a critical value, then anywhere on the Lasso path the type I error (false positive rate) will need to exceed a given threshold so that we can never have both errors at a low level at the same time. Our analysis uses tools from approximate message passing (AMP) theory as well as novel elements to deal with a possibly adaptive selection of the Lasso regularizing parameter.
Generic OS X Malware Detection Method Explained
When it comes to detecting OS X malware, the future may not be rooted in machine learning algorithms, but patterns and heatmap visualization, a researcher posits. In an academic paper published by Virus Bulletin on Monday, Vincent Van Mieghem, a former student at the Delft University of Technology in the Netherlands, describes how a recurring pattern he observed in OS X system calls can be used to indicate the presence of malware. Van Mieghem wrote the paper, "Behavioral Detection and Prevention of Malware on OS X," (.PDF) while interning at Fox-IT but has since moved on to PricewaterhouseCoopers' cybersecurity division. By the numbers, the detection method Van Mieghem concocted is a success; it detected infections from 100 percent of malware samples found on OS X systems at the time. The method apparently leaves little room for error too; it resulted in a scant 0 percent to 20 percent false positive rate, depending on the user, according to the paper.
Mapping the Similarities of Spectra: Global and Locally-biased Approaches to SDSS Galaxy Data
Lawlor, David, Budavรกri, Tamรกs, Mahoney, Michael W.
We apply a novel spectral graph technique, that of locally-biased semi-supervised eigenvectors, to study the diversity of galaxies. This technique permits us to characterize empirically the natural variations in observed spectra data, and we illustrate how this approach can be used in an exploratory manner to highlight both large-scale global as well as small-scale local structure in Sloan Digital Sky Survey (SDSS) data. We use this method in a way that simultaneously takes into account the measurements of spectral lines as well as the continuum shape. Unlike Principal Component Analysis, this method does not assume that the Euclidean distance between galaxy spectra is a good global measure of similarity between all spectra, but instead it only assumes that local difference information between similar spectra is reliable. Moreover, unlike other nonlinear dimensionality methods, this method can be used to characterize very finely both small-scale local as well as large-scale global properties of realistic noisy data. The power of the method is demonstrated on the SDSS Main Galaxy Sample by illustrating that the derived embeddings of spectra carry an unprecedented amount of information. By using a straightforward global or unsupervised variant, we observe that the main features correlate strongly with star formation rate and that they clearly separate active galactic nuclei. Computed parameters of the method can be used to describe line strengths and their interdependencies. By using a locally-biased or semi-supervised variant, we are able to focus on typical variations around specific objects of astronomical interest. We present several examples illustrating that this approach can enable new discoveries in the data as well as a detailed understanding of very fine local structure that would otherwise be overwhelmed by large-scale noise and global trends in the data.
Data Science Competitions 101: Anatomy and Approach
I recently participated in a weekend-long data science hackathon, titled'The Smart Recruits'. Organized by the amazing folks at Analytics Vidhya, it saw some serious competition. Although my performance can be classified as decent at best (47 out of 379 participants), it was among the more satisfying ones I have participated in on both AV (profile) and Kaggle (profile) over the last few months. Thus, I decided it might be worthwhile to try and share some insights as a data science autodidact. The competition required us to use historical data to create a model to help an organization pick out better recruits. The evaluation metric to be used for judging the predictions was AUC (area under the ROC curve).
Adversarial machine learning
I just got back from a very good conference organized by startup.ml: Please read on for my to comments on part of one of the very good talks. Classic machine learning (especially as it is taught in classes) emphasizes a nice safe static environment where you are given some unchanging data and are asked to produce a nice predictive model one time. It is formally easier that casual inference or statistical inference as being right often is enough, no matter what the reason. Adversarial machine learning is the formal name for studying what happens when conceding even a slightly more realistic alternative to assumptions of these types (harmlessly called "relaxing assumptions").
Performance measures in Azure ML: Accuracy, Precision, Recall and F1 Score.
This is the first of three articles about performance measures and graphs for binary learning models in Azure ML. Binary learning models are models which just predict one of two outcomes: positive or negative. These models are very well suited to drive decisions, such as whether to administer a patient a certain drug or to include a lead in a targeted marketing campaign. This first article lays the foundation by covering several statistical measures: accuracy, precision, recall and F1 score, These measures require a solid understanding of the two types of prediction errors which we will also cover: false positives and false negatives. In the second article we'll discuss the ROC curve and the related AUC measure. We'll also look at another graph in Azure ML called the Precision/Recall curve.
Sophos false positive detection ruins weekend for some Windows users
A bad malware signature caused Sophos antivirus products to detect a critical Windows file as malicious on Sunday, preventing some users from accessing their computers. Because the file was blocked, some users who attempted to log into their computers were greeted by a black screen. Sophos issued an update to fix the problem within a few hours and said that the issue only affected a specific 32-bit version of Windows 7 SP1 and not Windows XP, Vista, 8 or 10. "Based on current case volume and customer feedback, we believe the number of impacted systems to be minimal and confined to a small number of cases," the company said in a support article. One Twitter user who was affected by the issue said that he highly doubts only a small number of customers were affected, while another one reported that he's been on hold trying to reach Sophos Support by phone for over two hours. "An email would have been nice," one user told Sophos via Twitter.
Using Kernel Methods and Model Selection for Prediction of Preterm Birth
Vovsha, Ilia, Salleb-Aouissi, Ansaf, Raja, Anita, Koch, Thomas, Rybchuk, Alex, Radeva, Axinia, Rajan, Ashwath, Huang, Yiwen, Diab, Hatim, Tomar, Ashish, Wapner, Ronald
We describe an application of machine learning to the problem of predicting preterm birth. We conduct a secondary analysis on a clinical trial dataset collected by the National In- stitute of Child Health and Human Development (NICHD) while focusing our attention on predicting different classes of preterm birth. We compare three approaches for deriving predictive models: a support vector machine (SVM) approach with linear and non-linear kernels, logistic regression with different model selection along with a model based on decision rules prescribed by physician experts for prediction of preterm birth. Our approach highlights the pre-processing methods applied to handle the inherent dynamics, noise and gaps in the data and describe techniques used to handle skewed class distributions. Empirical experiments demonstrate significant improvement in predicting preterm birth compared to past work.
Decoding visual stimuli in human brain by using Anatomical Pattern Analysis on fMRI images
Yousefnezhad, Muhammad, Zhang, Daoqiang
A universal unanswered question in neuroscience and machine learning is whether computers can decode the patterns of the human brain. Multi-Voxels Pattern Analysis (MVPA) is a critical tool for addressing this question. However, there are two challenges in the previous MVPA methods, which include decreasing sparsity and noises in the extracted features and increasing the performance of prediction. In overcoming mentioned challenges, this paper proposes Anatomical Pattern Analysis (APA) for decoding visual stimuli in the human brain. This framework develops a novel anatomical feature extraction method and a new imbalance AdaBoost algorithm for binary classification. Further, it utilizes an Error-Correcting Output Codes (ECOC) method for multi-class prediction. APA can automatically detect active regions for each category of the visual stimuli. Moreover, it enables us to combine homogeneous datasets for applying advanced classification. Experimental studies on 4 visual categories (words, consonants, objects and scrambled photos) demonstrate that the proposed approach achieves superior performance to state-of-the-art methods.