Accuracy
Microsoft's newest milestone? World's lowest error rate in speech recognition ZDNet
The techniques Microsoft Research used to achieve a new world-best error rate will eventually enhance the Cortana Windows 10 personal assistant. Microsoft claims to have achieved the world's lowest error rate for speech recognition, as the company jostles with Amazon, Apple, Google, and IBM to develop products that understand speech as well as humans can. According to Microsoft, its speech scientists at Microsoft Research have achieved a word error rate (WER) of just 6.3 percent under an industry-standard evaluation, using techniques that will eventually enhance Cortana. The previous lowest error rate was 6.9 percent, achieved by IBM's Watson team, which beat their own record of eight percent set last year. Both Microsoft and IBM presented papers detailing their work on speech recognition at the Interspeech conference in San Francisco this week, where papers were also presented by Google's speech researchers.
Sparse Tensor Graphical Model: Non-convex Optimization and Statistical Inference
Sun, Will Wei, Wang, Zhaoran, Lyu, Xiang, Liu, Han, Cheng, Guang
We consider the estimation and inference of sparse graphical models that characterize the dependency structure of high-dimensional tensor-valued data. To facilitate the estimation of the precision matrix corresponding to each way of the tensor, we assume the data follow a tensor normal distribution whose covariance has a Kronecker product structure. A critical challenge in the estimation and inference of this model is the fact that its penalized maximum likelihood estimation involves minimizing a non-convex objective function. To address it, this paper makes two contributions: (i) In spite of the non-convexity of this estimation problem, we prove that an alternating minimization algorithm, which iteratively estimates each sparse precision matrix while fixing the others, attains an estimator with the optimal statistical rate of convergence. Notably, such an estimator achieves estimation consistency with only one tensor sample, which was not observed in the previous work. (ii) We propose a de-biased statistical inference procedure for testing hypotheses on the true support of the sparse precision matrices, and employ it for testing a growing number of hypothesis with false discovery rate (FDR) control. The asymptotic normality of our test statistic and the consistency of FDR control procedure are established. Our theoretical results are further backed up by thorough numerical studies. We implement the methods into a publicly available R package Tlasso.
Microsoft's newest milestone? World's lowest error rate in speech recognition ZDNet
The techniques Microsoft Research used to achieve a new world-best error rate will eventually enhance the Cortana Windows 10 personal assistant. Microsoft claims to have achieved the world's lowest error rate for speech recognition, as the company jostles with Amazon, Apple, Google, and IBM to develop products that understand speech as well as humans can. According to Microsoft, its speech scientists at Microsoft Research have achieved a word error rate (WER) of just 6.3 percent under an industry-standard evaluation, using techniques that will eventually enhance Cortana. The previous lowest error rate was 6.9 percent, achieved by IBM's Watson team, which beat their own record of eight percent set last year. Both Microsoft and IBM presented papers detailing their work on speech recognition at the Interspeech conference in San Francisco this week, where papers were also presented by Google's speech researchers.
False Discoveries Occur Early on the Lasso Path
Su, Weijie, Bogdan, Malgorzata, Candes, Emmanuel
In regression settings where explanatory variables have very low correlations and there are relatively few effects, each of large magnitude, we expect the Lasso to find the important variables with few errors, if any. This paper shows that in a regime of linear sparsity---meaning that the fraction of variables with a non-vanishing effect tends to a constant, however small---this cannot really be the case, even when the design variables are stochastically independent. We demonstrate that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are. We derive a sharp asymptotic trade-off between false and true positive rates or, equivalently, between measures of type I and type II errors along the Lasso path. This trade-off states that if we ever want to achieve a type II error (false negative rate) under a critical value, then anywhere on the Lasso path the type I error (false positive rate) will need to exceed a given threshold so that we can never have both errors at a low level at the same time. Our analysis uses tools from approximate message passing (AMP) theory as well as novel elements to deal with a possibly adaptive selection of the Lasso regularizing parameter.
Generic OS X Malware Detection Method Explained
When it comes to detecting OS X malware, the future may not be rooted in machine learning algorithms, but patterns and heatmap visualization, a researcher posits. In an academic paper published by Virus Bulletin on Monday, Vincent Van Mieghem, a former student at the Delft University of Technology in the Netherlands, describes how a recurring pattern he observed in OS X system calls can be used to indicate the presence of malware. Van Mieghem wrote the paper, "Behavioral Detection and Prevention of Malware on OS X," (.PDF) while interning at Fox-IT but has since moved on to PricewaterhouseCoopers' cybersecurity division. By the numbers, the detection method Van Mieghem concocted is a success; it detected infections from 100 percent of malware samples found on OS X systems at the time. The method apparently leaves little room for error too; it resulted in a scant 0 percent to 20 percent false positive rate, depending on the user, according to the paper.
Mapping the Similarities of Spectra: Global and Locally-biased Approaches to SDSS Galaxy Data
Lawlor, David, Budavรกri, Tamรกs, Mahoney, Michael W.
We apply a novel spectral graph technique, that of locally-biased semi-supervised eigenvectors, to study the diversity of galaxies. This technique permits us to characterize empirically the natural variations in observed spectra data, and we illustrate how this approach can be used in an exploratory manner to highlight both large-scale global as well as small-scale local structure in Sloan Digital Sky Survey (SDSS) data. We use this method in a way that simultaneously takes into account the measurements of spectral lines as well as the continuum shape. Unlike Principal Component Analysis, this method does not assume that the Euclidean distance between galaxy spectra is a good global measure of similarity between all spectra, but instead it only assumes that local difference information between similar spectra is reliable. Moreover, unlike other nonlinear dimensionality methods, this method can be used to characterize very finely both small-scale local as well as large-scale global properties of realistic noisy data. The power of the method is demonstrated on the SDSS Main Galaxy Sample by illustrating that the derived embeddings of spectra carry an unprecedented amount of information. By using a straightforward global or unsupervised variant, we observe that the main features correlate strongly with star formation rate and that they clearly separate active galactic nuclei. Computed parameters of the method can be used to describe line strengths and their interdependencies. By using a locally-biased or semi-supervised variant, we are able to focus on typical variations around specific objects of astronomical interest. We present several examples illustrating that this approach can enable new discoveries in the data as well as a detailed understanding of very fine local structure that would otherwise be overwhelmed by large-scale noise and global trends in the data.
Adversarial machine learning
I just got back from a very good conference organized by startup.ml: Please read on for my to comments on part of one of the very good talks. Classic machine learning (especially as it is taught in classes) emphasizes a nice safe static environment where you are given some unchanging data and are asked to produce a nice predictive model one time. It is formally easier that casual inference or statistical inference as being right often is enough, no matter what the reason. Adversarial machine learning is the formal name for studying what happens when conceding even a slightly more realistic alternative to assumptions of these types (harmlessly called "relaxing assumptions").
Performance measures in Azure ML: Accuracy, Precision, Recall and F1 Score.
This is the first of three articles about performance measures and graphs for binary learning models in Azure ML. Binary learning models are models which just predict one of two outcomes: positive or negative. These models are very well suited to drive decisions, such as whether to administer a patient a certain drug or to include a lead in a targeted marketing campaign. This first article lays the foundation by covering several statistical measures: accuracy, precision, recall and F1 score, These measures require a solid understanding of the two types of prediction errors which we will also cover: false positives and false negatives. In the second article we'll discuss the ROC curve and the related AUC measure. We'll also look at another graph in Azure ML called the Precision/Recall curve.
Sophos false positive detection ruins weekend for some Windows users
A bad malware signature caused Sophos antivirus products to detect a critical Windows file as malicious on Sunday, preventing some users from accessing their computers. Because the file was blocked, some users who attempted to log into their computers were greeted by a black screen. Sophos issued an update to fix the problem within a few hours and said that the issue only affected a specific 32-bit version of Windows 7 SP1 and not Windows XP, Vista, 8 or 10. "Based on current case volume and customer feedback, we believe the number of impacted systems to be minimal and confined to a small number of cases," the company said in a support article. One Twitter user who was affected by the issue said that he highly doubts only a small number of customers were affected, while another one reported that he's been on hold trying to reach Sophos Support by phone for over two hours. "An email would have been nice," one user told Sophos via Twitter.
Using Kernel Methods and Model Selection for Prediction of Preterm Birth
Vovsha, Ilia, Salleb-Aouissi, Ansaf, Raja, Anita, Koch, Thomas, Rybchuk, Alex, Radeva, Axinia, Rajan, Ashwath, Huang, Yiwen, Diab, Hatim, Tomar, Ashish, Wapner, Ronald
We describe an application of machine learning to the problem of predicting preterm birth. We conduct a secondary analysis on a clinical trial dataset collected by the National In- stitute of Child Health and Human Development (NICHD) while focusing our attention on predicting different classes of preterm birth. We compare three approaches for deriving predictive models: a support vector machine (SVM) approach with linear and non-linear kernels, logistic regression with different model selection along with a model based on decision rules prescribed by physician experts for prediction of preterm birth. Our approach highlights the pre-processing methods applied to handle the inherent dynamics, noise and gaps in the data and describe techniques used to handle skewed class distributions. Empirical experiments demonstrate significant improvement in predicting preterm birth compared to past work.