Plotting

 Platt, John C.


Tackling Climate Change with Machine Learning

arXiv.org Artificial Intelligence

Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we describe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the machine learning community to join the global effort against climate change.


Regularized Minimax Conditional Entropy for Crowdsourcing

arXiv.org Machine Learning

There is a rapidly increasing interest in crowdsourcing for data labeling. By crowdsourcing, a large number of labels can be often quickly gathered at low cost. However, the labels provided by the crowdsourcing workers are usually not of high quality. In this paper, we propose a minimax conditional entropy principle to infer ground truth from noisy crowdsourced labels. Under this principle, we derive a unique probabilistic labeling model jointly parameterized by worker ability and item difficulty. We also propose an objective measurement principle, and show that our method is the only method which satisfies this objective measurement principle. We validate our method through a variety of real crowdsourcing datasets with binary, multiclass or ordinal labels.


Learning from the Wisdom of Crowds by Minimax Entropy

Neural Information Processing Systems

An important way to make large training sets is to gather noisy labels from crowds of nonexperts. We propose a minimax entropy principle to improve the quality of these labels. Our method assumes that labels are generated by a probability distribution over workers, items, and labels. By maximizing the entropy of this distribution, the method naturally infers item confusability and worker expertise. We infer the ground truth by minimizing the entropy of this distribution, which we show minimizes the Kullback-Leibler (KL) divergence between the probability distribution and the unknown truth. We show that a simple coordinate descent scheme can optimize minimax entropy. Empirically, our results are substantially better than previously published methods for the same problem.


Fast Variational Inference for Large-scale Internet Diagnosis

Neural Information Processing Systems

Web servers on the Internet need to maintain high reliability, but the cause of intermittent failures of web transactions is non-obvious. We use Bayesian inference to diagnose problems with web services. This diagnosis problem is far larger than any previously attempted: it requires inference of 10^4 possible faults from 10^5 observations. Further, such inference must be performed in less than a second. Inference can be done at this speed by combining a variational approximation, a mean-field approximation, and the use of stochastic gradient descent to optimize a variational cost function. We use this fast inference to diagnose a time series of anomalous HTTP requests taken from a real web service. The inference is fast enough to analyze network logs with billions of entries in a matter of hours.


Multiple Instance Boosting for Object Detection

Neural Information Processing Systems

A good image object detection algorithm is accurate, fast, and does not require exact locations of objects in a training set. We can create such an object detector by taking the architecture of the Viola-Jones detector cascade and training it with a new variant of boosting that we call MIL-Boost. MILBoost uses cost functions from the Multiple Instance Learning literaturecombined with the AnyBoost framework. We adapt the feature selection criterion of MILBoost to optimize the performance of the Viola-Jones cascade. Experiments show that the detection rate is up to 1.6 times better using MILBoost. This increased detection rate shows the advantage of simultaneously learning the locations and scales of the objects in the training set along with the parameters of the classifier.


Fast Embedding of Sparse Similarity Graphs

Neural Information Processing Systems

This paper applies fast sparse multidimensional scaling (MDS) to a large graph of music similarity, with 267K vertices that represent artists, albums, and tracks; and 3.22M edges that represent similarity between those entities. Once vertices are assigned locations in a Euclidean space, the locations can be used to browse music and to generate playlists. MDS on very large sparse graphs can be effectively performed by a family of algorithms called Rectangular Dijsktra (RD) MDS algorithms. These RD algorithms operate on a dense rectangular slice of the distance matrix, created by calling Dijsktra a constant number of times. Two RD algorithms are compared: Landmark MDS, which uses the Nystrรถm approximation to perform MDS; and a new algorithm called Fast Sparse Embedding, which uses FastMap. These algorithms compare favorably to Laplacian Eigenmaps, both in terms of speed and embedding quality.


Fast Embedding of Sparse Similarity Graphs

Neural Information Processing Systems

This paper applies fast sparse multidimensional scaling (MDS) to a large graph of music similarity, with 267K vertices that represent artists, albums, andtracks; and 3.22M edges that represent similarity between those entities. Once vertices are assigned locations in a Euclidean space, the locations can be used to browse music and to generate playlists. MDS on very large sparse graphs can be effectively performed by a family of algorithms called Rectangular Dijsktra (RD) MDS algorithms. These RD algorithms operate on a dense rectangular slice of the distance matrix, created by calling Dijsktra a constant number of times. Two RD algorithms are compared: Landmark MDS, which uses the Nystrรถm approximation toperform MDS; and a new algorithm called Fast Sparse Embedding, which uses FastMap. These algorithms compare favorably to Laplacian Eigenmaps, both in terms of speed and embedding quality.


Learning a Gaussian Process Prior for Automatically Generating Music Playlists

Neural Information Processing Systems

This paper presents AutoDJ: a system for automatically generating music playlists based on one or more seed songs selected by a user. AutoDJ uses Gaussian Process Regression to learn a user preference function over songs. This function takes music metadata as inputs. This paper further introduces Kernel Meta-Training, which is a method of learning a Gaussian Process kernel from a distribution of functions that generates the learned function. For playlist generation, AutoDJ learns a kernel from a large set of albums. This learned kernel is shown to be more effective at predicting users' playlists than a reasonable hand-designed kernel.


Learning a Gaussian Process Prior for Automatically Generating Music Playlists

Neural Information Processing Systems

This paper presents AutoDJ: a system for automatically generating music playlistsbased on one or more seed songs selected by a user. AutoDJ uses Gaussian Process Regression to learn a user preference function over songs. This function takes music metadata as inputs. This paper further introduces Kernel Meta-Training, which is a method of learning a Gaussian Process kernel from a distribution of functions that generates the learned function. For playlist generation, AutoDJ learns a kernel from a large set of albums. This learned kernel is shown to be more effective at predicting users' playlists than a reasonable hand-designed kernel.


Speech Denoising and Dereverberation Using Probabilistic Models

Neural Information Processing Systems

This paper presents a unified probabilistic framework for denoising and dereverberation of speech signals. The framework transforms the denoising and dereverberation problems into Bayes-optimal signal estimation. The key idea is to use a strong speech model that is pre-trained on a large data set of clean speech. Computational efficiency is achieved by using variational EM, working in the frequency domain, and employing conjugate priors. The framework covers both single and multiple microphones. We apply this approach to noisy reverberant speech signals and get results substantially better than standard methods.