Bayesian Learning
Gaussian Processes for Music Audio Modelling and Content Analysis
Alvarado, Pablo A., Stowell, Dan
Real music signals are highly variable, yet they have strong statistical structure. Prior information about the underlying physical mechanisms by which sounds are generated and rules by which complex sound structure is constructed (notes, chords, a complete musical score), can be naturally unified using Bayesian modelling techniques. Typically algorithms for Automatic Music Transcription independently carry out individual tasks such as multiple-F0 detection and beat tracking. The challenge remains to perform joint estimation of all parameters. We present a Bayesian approach for modelling music audio, and content analysis. The proposed methodology based on Gaussian processes seeks joint estimation of multiple music concepts by incorporating into the kernel prior information about non-stationary behaviour, dynamics, and rich spectral content present in the modelled music signal. We illustrate the benefits of this approach via two tasks: pitch estimation, and inferring missing segments in a polyphonic audio recording.
Square Root Graphical Models: Multivariate Generalizations of Univariate Exponential Families that Permit Positive Dependencies
Inouye, David I., Ravikumar, Pradeep, Dhillon, Inderjit S.
We develop Square Root Graphical Models (SQR), a novel class of parametric graphical models that provides multivariate generalizations of univariate exponential family distributions. Previous multivariate graphical models [Yang et al. 2015] did not allow positive dependencies for the exponential and Poisson generalizations. However, in many real-world datasets, variables clearly have positive dependencies. For example, the airport delay time in New York---modeled as an exponential distribution---is positively related to the delay time in Boston. With this motivation, we give an example of our model class derived from the univariate exponential distribution that allows for almost arbitrary positive and negative dependencies with only a mild condition on the parameter matrix---a condition akin to the positive definiteness of the Gaussian covariance matrix. Our Poisson generalization allows for both positive and negative dependencies without any constraints on the parameter values. We also develop parameter estimation methods using node-wise regressions with $\ell_1$ regularization and likelihood approximation methods using sampling. Finally, we demonstrate our exponential generalization on a synthetic dataset and a real-world dataset of airport delay times.
Kalman-based Stochastic Gradient Method with Stop Condition and Insensitivity to Conditioning
Modern proximal and stochastic gradient descent (SGD) methods are believed to efficiently minimize large composite objective functions, but such methods have two algorithmic challenges: (1) a lack of fast or justified stop conditions, and (2) sensitivity to the objective function's conditioning. In response to the first challenge, modern proximal and SGD methods guarantee convergence only after multiple epochs, but such a guarantee renders proximal and SGD methods infeasible when the number of component functions is very large or infinite. In response to the second challenge, second order SGD methods have been developed, but they are marred by the complexity of their analysis. In this work, we address these challenges on the limited, but important, linear regression problem by introducing and analyzing a second order proximal/SGD method based on Kalman Filtering (kSGD). Through our analysis, we show kSGD is asymptotically optimal, develop a fast algorithm for very large, infinite or streaming data sources with a justified stop condition, prove that kSGD is insensitive to the problem's conditioning, and develop a unique approach for analyzing the complex second order dynamics. Our theoretical results are supported by numerical experiments on three regression problems (linear, nonparametric wavelet, and logistic) using three large publicly available datasets. Moreover, our analysis and experiments lay a foundation for embedding kSGD in multiple epoch algorithms, extending kSGD to other problem classes, and developing parallel and low memory kSGD implementations.
On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis
Foulds, James, Geumlek, Joseph, Welling, Max, Chaudhuri, Kamalika
Bayesian inference has great promise for the privacy-preserving analysis of sensitive data, as posterior sampling automatically preserves differential privacy, an algorithmic notion of data privacy, under certain conditions (Dimitrakakis et al., 2014; Wang et al., 2015b). While this one posterior sample (OPS) approach elegantly provides privacy "for free," it is data inefficient in the sense of asymptotic relative efficiency (ARE). We show that a simple alternative based on the Laplace mechanism, the workhorse of differential privacy, is as asymptotically efficient as non-private posterior inference, under general assumptions. This technique also has practical advantages including efficient use of the privacy budget for MCMC. We demonstrate the practicality of our approach on a time-series analysis of sensitive military records from the Afghanistan and Iraq wars disclosed by the Wikileaks organization.
A moment-matching Ferguson and Klass algorithm
Completely random measures (CRM) represent the key building block of a wide variety of popular stochastic models and play a pivotal role in modern Bayesian Nonparametrics. A popular representation of CRMs as a random series with decreasing jumps is due to Ferguson and Klass (1972). This can immediately be turned into an algorithm for sampling realizations of CRMs or more elaborate models involving transformed CRMs. However, concrete implementation requires to truncate the random series at some threshold resulting in an approximation error. The goal of this paper is to quantify the quality of the approximation by a moment-matching criterion, which consists in evaluating a measure of discrepancy between actual moments and moments based on the simulation output. Seen as a function of the truncation level, the methodology can be used to determine the truncation level needed to reach a certain level of precision. The resulting moment-matching \FK algorithm is then implemented and illustrated on several popular Bayesian nonparametric models.
What are the Best Machine Learning Packages in R? R-bloggers
The most common question asked by prospective data scientists is – "What is the best programming language for Machine Learning?" The answer to this question always results in a debate whether to choose R, Python or MATLAB for Machine Learning. Nobody can, in reality, answer the question as to whether Python or R is best language for Machine Learning. However, the programming language one should choose for machine learning directly depends on the requirements of a given data problem, the likes and preferences of the data scientist and the context of machine learning activities they want to perform. According to a survey on Kaggler's Favourite Tools, the open source R programming language turned out to be the favourite among 543 Kagglers of the 1714 Kaggler's listing their data science tools.
Structure Learning in Graphical Modeling
Drton, Mathias, Maathuis, Marloes H.
A graphical model is a statistical model that is associated to a graph whose nodes correspond to variables of interest. The edges of the graph reflect allowed conditional dependencies among the variables. Graphical models admit computationally convenient factorization properties and have long been a valuable tool for tractable modeling of multivariate distributions. More recently, applications such as reconstructing gene regulatory networks from gene expression data have driven major advances in structure learning, that is, estimating the graph underlying a model. We review some of these advances and discuss methods such as the graphical lasso and neighborhood selection for undirected graphical models (or Markov random fields), and the PC algorithm and score-based search methods for directed graphical models (or Bayesian networks).
Classification of weak multi-view signals by sharing factors in a mixture of Bayesian group factor analyzers
Remes, Sami, Mononen, Tommi, Kaski, Samuel
We propose a novel classification model for weak signal data, building upon a recent model for Bayesian multi-view learning, Group Factor Analysis (GFA). Instead of assuming all data to come from a single GFA model, we allow latent clusters, each having a different GFA model and producing a different class distribution. We show that sharing information across the clusters, by sharing factors, increases the classification accuracy considerably; the shared factors essentially form a flexible noise model that explains away the part of data not related to classification. Motivation for the setting comes from single-trial functional brain imaging data, having a very low signal-to-noise ratio and a natural multi-view setting, with the different sensors, measurement modalities (EEG, MEG, fMRI) and possible auxiliary information as views. We demonstrate our model on a MEG dataset.
Revealed Preference at Scale: Learning Personalized Preferences from Assortment Choices
Kallus, Nathan, Udell, Madeleine
We consider the problem of learning the preferences of a heterogeneous population by observing choices from an assortment of products, ads, or other offerings. Our observation model takes a form common in assortment planning applications: each arriving customer is offered an assortment consisting of a subset of all possible offerings; we observe only the assortment and the customer's single choice. In this paper we propose a mixture choice model with a natural underlying low-dimensional structure, and show how to estimate its parameters. In our model, the preferences of each customer or segment follow a separate parametric choice model, but the underlying structure of these parameters over all the models has low dimension. We show that a nuclear-norm regularized maximum likelihood estimator can learn the preferences of all customers using a number of observations much smaller than the number of item-customer combinations. This result shows the potential for structural assumptions to speed up learning and improve revenues in assortment planning and customization. We provide a specialized factored gradient descent algorithm and study the success of the approach empirically.
Relaxation of the EM Algorithm via Quantum Annealing
Miyahara, Hideyuki, Tsumura, Koji
The EM algorithm is a novel numerical method to obtain maximum likelihood estimates and is often used for practical calculations. However, many of maximum likelihood estimation problems are nonconvex, and it is known that the EM algorithm fails to give the optimal estimate by being trapped by local optima. In order to deal with this difficulty, we propose a deterministic quantum annealing EM algorithm by introducing the mathematical mechanism of quantum fluctuations into the conventional EM algorithm because quantum fluctuations induce the tunnel effect and are expected to relax the difficulty of nonconvex optimization problems in the maximum likelihood estimation problems. We show a theorem that guarantees its convergence and give numerical experiments to verify its efficiency.