Accuracy
Apple's Machine Learning Has Cut Siri's Error Rate by a Factor of Two
Steven Levy has published an in-depth article about Apple's artificial intelligence and machine learning efforts, after meeting with senior executives Craig Federighi, Eddy Cue, Phil Schiller, and two Siri scientists at the company's headquarters. Apple provided Levy with a closer look at how machine learning is deeply integrated into Apple software and services, led by Siri, which the article reveals has been powered by a neural-net based system since 2014. Apple said the backend change greatly improved the personal assistant's accuracy. "This was one of those things where the jump was so significant that you do the test again to make sure that somebody didn't drop a decimal place," says Eddy Cue, Apple's senior vice president of internet software and services.Alex Acero, who leads the Siri speech team at Apple, said Siri's error rate has been lowered by more than a factor of two in many cases. "The error rate has been cut by a factor of two in all the languages, more than a factor of two in many cases," says Acero. "That's mostly due to deep learning and the way we have optimized it -- not just the algorithm itself but in the context of the whole end-to-end product."Acero
Built on the backs of Junior Security Analysts
This was the most often asked question at this year's Blackhat Conference 2016, especially for anyone with even a scent of Machine Learning algorithms in their product. With the biggest issue facing the SOC being the inability to sift through 1,000's of alerts per day due to a shortage in employees. It doesn't take a genius to get to the question of what it's going to cost me in man hours to sift through a new mouse-traps false positives. How many more Junior Analyst do I need to add to my team to look over my box? In the last five years I've watched more and more SOCs being built on the backs of Junior Security Analyst.
Naive Bayes and Text Classification
Naive Bayes classifiers, a family of classifiers that are based on the popular Bayes' probability theorem, are known for creating simple yet well performing models, especially in the fields of document classification and disease prediction. In this first part of a series, we will take a look at the theory of naive Bayes classifiers and introduce the basic concepts of text classification. In following articles, we will implement those concepts to train a naive Bayes spam filter and apply naive Bayes to song classification based on lyrics. Starting more than half a century ago, scientists became very serious about addressing the question: "Can we build a model that learns from available data and automatically makes the right decisions and predictions?" Looking back, this sounds almost like a rhetoric question, and the answer can be found in numerous applications that are emerging from the fields of pattern classification, machine learning, and artificial intelligence. Data from various sensoring devices combined with powerful learning algorithms and domain knowledge led to many great inventions that we now take for granted in our everyday life: Internet queries via search engines like Google, text recognition at the post office, barcode scanners at the supermarket, the diagnosis of diseases, speech recognition by Siri or Google Now on our mobile phone, just to name a few.
Entry Point Data
In this short tutorial I want to provide a short overview of some of my favorite Python tools for common procedures as entry points for general pattern classification and machine learning tasks, and various other data analyses. In this section want to recommend a way for installing the required Python-packages packages if you have not done so, yet. Otherwise you can skip this part. Although they can be installed step-by-step "manually", but I highly recommend you to take a look at the Anaconda Python distribution for scientific computing. Anaconda is distributed by Continuum Analytics, but it is completely free and includes more than 195 packages for science and data analysis as of today.
Predictive modeling, supervised machine learning, and pattern classification
A Support Vector Machine (SVM) is a classification method that samples hyperplanes which separate between two or multiple classes. Eventually, the hyperplane with the highest margin is retained, where "margin" is defined as the minimum distance from sample points to the hyperplane. The sample point(s) that form margin are called support vectors and establish the final SVM model. Bayes classifiers are based on a statistical model (i.e., Bayes theorem: calculating posterior probabilities based on the prior probability and the so-called likelihood). A Naive Bayes classifier assumes that all attributes are conditionally independent, thereby, computing the likelihood is simplified to the product of the conditional probabilities of observing individual attributes given a particular class label. Artificial Neural Networks (ANN) are graph-like classifiers that mimic the structure of a human or animal "brain" where the interconnected nodes represent the neurons. Decision tree classifiers are tree like graphs, where nodes in the graph test certain conditions on a particular set of features, and branches split the decision towards the leaf nodes. Leaves represent lowest level in the graph and determine the class labels. Optimal tree are trained by minimizing Gini impurity, or maximizing information gain.
ROCS-Derived Features for Virtual Screening
Ligand-based virtual screening is based on the assumption that similar compounds have similar biological activity [Willett, 2009]. Compound similarity can be assessed in many ways, including comparisons of molecular "fingerprints" that encode structural features or molecular properties [Todeschini and Consonni, 2009] and measurements of shape, chemical, and/or electrostatic similarity in three dimensions [Hawkins et al., 2007; Muchmore et al., 2006; Ballester and Richards, 2007]. Three-dimensional approaches such as rapid overlay of chemical structures (ROCS) [Hawkins et al., 2007] are especially interesting because of their potential to identify molecules that are similar from the point of view of a target protein but dissimilar in underlying chemical structure ("scaffold hopping"; [Bรถhm et al., 2004]). ROCS represents atoms as three-dimensional Gaussian functions [Grant and Pickup, 1995; Grant et al., 1996] and calculates similarity as a function of volume overlaps between alignments of pre-generated molecular conformers. Chemical ("color") similarity is measured by overlaps between dummy atoms marking interesting chemical functionalities: hydrogen bond donors and acceptors, charged functional groups, rings, and hydrophobic groups.
High-dimensional Mixed Graphical Models
Cheng, Jie, Li, Tianxi, Levina, Elizaveta, Zhu, Ji
High-Dimensional Mixed Graphical Models Jie Cheng โ , Tianxi Liโก, Elizaveta Levinaโก, Ji Zhuโก โ Google, Inc.,โก Department of Statistics, University of Michigan March 22, 2018 Abstract While graphical models for continuous data (Gaussian graphical models) and discrete data (Ising models) have been extensively studied, there is little work on graphical models for data sets with both continuous and discrete variables (mixed data), which are common in many scientific applications. We propose a novel graphical model for mixed data, which is simple enough to be suitable for high-dimensional data, yet flexible enough to represent all possible graph structures. We develop a computationally efficient regression-based algorithm for fitting the model by focusing on the conditional log-likelihood of each variable given the rest. The parameters have a natural group structure, and sparsity in the fitted graph is attained by incorporating a group lasso penalty, approximated by a weighted lasso penalty for computational efficiency. We demonstrate the effectiveness of our method through an extensive simulation study and apply it to a music annotation data set (CAL500), obtaining a sparse and interpretable graphical model relating the continuous features of the audio signal to binary variables such as genre, emotions, and usage associated with particular songs. 1 arXiv:1304.2810v3 Key Words: Conditional Gaussian density, Graphical model, Group lasso, Mixed variables, Music annotation. 1 Introduction Graphical models have proven to be a useful tool in representing the conditional dependency structure of multivariate distributions. The undirected graphical model in particular, sometimes also referred to as the Markov network, has drawn a notable amount of attention over the past decade. In an undirected graphical model, nodes in the graph represent the variables, while an edge between a pair of variables indicates that they are dependent conditional on all other variables. The properties of these models are by now well understood and studied both in the classical and the high-dimensional settings. Both these models can only deal with variables of one kind - either all continuous variables in Gaussian models or all binary variables in the Ising model (extensions of the Ising model to general discrete data, while possible in principle, are rarely used in 2 practice). In many applications, however, data sources are complex and varied, and frequently result in mixed types of data, with both continuous and discrete variables present in the same dataset. In this paper, we will focus on graphical models for this type of mixed data (mixed graphical models).
Network Volume Anomaly Detection and Identification in Large-scale Networks based on Online Time-structured Traffic Tensor Tracking
Kasai, Hiroyuki, Kellerer, Wolfgang, Kleinsteuber, Martin
This paper addresses network anomography, that is, the problem of inferring network-level anomalies from indirect link measurements. This problem is cast as a low-rank subspace tracking problem for normal flows under incomplete observations, and an outlier detection problem for abnormal flows. Since traffic data is large-scale time-structured data accompanied with noise and outliers under partial observations, an efficient modeling method is essential. To this end, this paper proposes an online subspace tracking of a Hankelized time-structured traffic tensor for normal flows based on the Candecomp/PARAFAC decomposition exploiting the recursive least squares (RLS) algorithm. We estimate abnormal flows as outlier sparse flows via sparsity maximization in the underlying under-constrained linear-inverse problem. A major advantage is that our algorithm estimates normal flows by low-dimensional matrices with time-directional features as well as the spatial correlation of multiple links without using the past observed measurements and the past model parameters. Extensive numerical evaluations show that the proposed algorithm achieves faster convergence per iteration of model approximation, and better volume anomaly detection performance compared to state-of-the-art algorithms.
Probabilistic Data Analysis with Probabilistic Programming
Saad, Feras, Mansinghka, Vikash
Probabilistic techniques are central to data analysis, but different approaches can be difficult to apply, combine, and compare. This paper introduces composable generative population models (CGPMs), a computational abstraction that extends directed graphical models and can be used to describe and compose a broad class of probabilistic data analysis techniques. Examples include hierarchical Bayesian models, multivariate kernel methods, discriminative machine learning, clustering algorithms, dimensionality reduction, and arbitrary probabilistic programs. We also demonstrate the integration of CGPMs into BayesDB, a probabilistic programming platform that can express data analysis tasks using a modeling language and a structured query language. The practical value is illustrated in two ways. First, CGPMs are used in an analysis that identifies satellite data records which probably violate Kepler's Third Law, by composing causal probabilistic programs with non-parametric Bayes in under 50 lines of probabilistic code. Second, for several representative data analysis tasks, we report on lines of code and accuracy measurements of various CGPMs, plus comparisons with standard baseline solutions from Python and MATLAB libraries.