Goto

Collaborating Authors

 Learning Graphical Models


Encrypted statistical machine learning: new privacy preserving methods

arXiv.org Machine Learning

We present two new statistical machine learning methods designed to learn on fully homomorphic encrypted (FHE) data. The introduction of FHE schemes following Gentry (2009) opens up the prospect of privacy preserving statistical machine learning analysis and modelling of encrypted data without compromising security constraints. We propose tailored algorithms for applying extremely random forests, involving a new cryptographic stochastic fraction estimator, and na\"{i}ve Bayes, involving a semi-parametric model for the class decision boundary, and show how they can be used to learn and predict from encrypted data. We demonstrate that these techniques perform competitively on a variety of classification data sets and provide detailed information about the computational practicalities of these and other FHE methods.


Nested Hierarchical Dirichlet Processes for Multi-Level Non-Parametric Admixture Modeling

arXiv.org Machine Learning

Dirichlet Process(DP) is a Bayesian non-parametric prior for infinite mixture modeling, where the number of mixture components grows with the number of data items. The Hierarchical Dirichlet Process (HDP), is an extension of DP for grouped data, often used for non-parametric topic modeling, where each group is a mixture over shared mixture densities. The Nested Dirichlet Process (nDP), on the other hand, is an extension of the DP for learning group level distributions from data, simultaneously clustering the groups. It allows group level distributions to be shared across groups in a non-parametric setting, leading to a non-parametric mixture of mixtures. The nCRF extends the nDP for multilevel non-parametric mixture modeling, enabling modeling topic hierarchies. However, the nDP and nCRF do not allow sharing of distributions as required in many applications, motivating the need for multi-level non-parametric admixture modeling. We address this gap by proposing multi-level nested HDPs (nHDP) where the base distribution of the HDP is itself a HDP at each level thereby leading to admixtures of admixtures at each level. Because of couplings between various HDP levels, scaling up is naturally a challenge during inference. We propose a multi-level nested Chinese Restaurant Franchise (nCRF) representation for the nested HDP, with which we outline an inference algorithm based on Gibbs Sampling. We evaluate our model with the two level nHDP for non-parametric entity topic modeling where an inner HDP creates a countably infinite set of topic mixtures and associates them with author entities, while an outer HDP associates documents with these author entities. In our experiments on two real world research corpora, the nHDP is able to generalize significantly better than existing models and detect missing author entities with a reasonable level of accuracy.


Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces

arXiv.org Machine Learning

We investigate a Gaussian mixture model (GMM) with component means constrained in a pre-selected subspace. Applications to classification and clustering are explored. An EM-type estimation algorithm is derived. We prove that the subspace containing the component means of a GMM with a common covariance matrix also contains the modes of the density and the class means. This motivates us to find a subspace by applying weighted principal component analysis to the modes of a kernel density and the class means. To circumvent the difficulty of deciding the kernel bandwidth, we acquire multiple subspaces from the kernel densities based on a sequence of bandwidths. The GMM constrained by each subspace is estimated; and the model yielding the maximum likelihood is chosen. A dimension reduction property is proved in the sense of being informative for classification or clustering. Experiments on real and simulated data sets are conducted to examine several ways of determining the subspace and to compare with the reduced rank mixture discriminant analysis (MDA). Our new method with the simple technique of spanning the subspace only by class means often outperforms the reduced rank MDA when the subspace dimension is very low, making it particularly appealing for visualization.


Automatic Relevance Determination For Deep Generative Models

arXiv.org Machine Learning

A recurring problem when building probabilistic latent variable models is regularization and model selection, for instance, the choice of the dimensionality of the latent space. In the context of belief networks with latent variables, this problem has been adressed with Automatic Relevance Determination (ARD) employing Monte Carlo inference. We present a variational inference approach to ARD for Deep Generative Models using doubly stochastic variational inference to provide fast and scalable learning. We show empirical results on a standard dataset illustrating the effects of contracting the latent space automatically. We show that the resulting latent representations are significantly more compact without loss of expressive power of the learned models.


Another Look at DWD: Thrifty Algorithm and Bayes Risk Consistency in RKHS

arXiv.org Machine Learning

Distance weighted discrimination (DWD) is a margin-based classifier with an interesting geometric motivation. DWD was originally proposed as a superior alternative to the support vector machine (SVM), however DWD is yet to be popular compared with the SVM. The main reasons are twofold. First, the state-of-the-art algorithm for solving DWD is based on the second-order-cone programming (SOCP), while the SVM is a quadratic programming problem which is much more efficient to solve. Second, the current statistical theory of DWD mainly focuses on the linear DWD for the high-dimension-low-sample-size setting and data-piling, while the learning theory for the SVM mainly focuses on the Bayes risk consistency of the kernel SVM. In fact, the Bayes risk consistency of DWD is presented as an open problem in the original DWD paper. In this work, we advance the current understanding of DWD from both computational and theoretical perspectives. We propose a novel efficient algorithm for solving DWD, and our algorithm can be several hundred times faster than the existing state-of-the-art algorithm based on the SOCP. In addition, our algorithm can handle the generalized DWD, while the SOCP algorithm only works well for a special DWD but not the generalized DWD. Furthermore, we consider a natural kernel DWD in a reproducing kernel Hilbert space and then establish the Bayes risk consistency of the kernel DWD. We compare DWD and the SVM on several benchmark data sets and show that the two have comparable classification accuracy, but DWD equipped with our new algorithm can be much faster to compute than the SVM.


An Experimental Comparison of Hybrid Algorithms for Bayesian Network Structure Learning

arXiv.org Artificial Intelligence

We present a novel hybrid algorithm for Bayesian network structure learning, called Hybrid HPC (H2PC). It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. It is based on a subroutine called HPC, that combines ideas from incremental and divide-and-conquer constraint-based methods to learn the parents and children of a target variable. We conduct an experimental comparison of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning, on several benchmarks with various data sizes. Our extensive experiments show that H2PC outperforms MMHC both in terms of goodness of fit to new data and in terms of the quality of the network structure itself, which is closer to the true dependence structure of the data. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.


Bayesian Hypothesis Testing for Block Sparse Signal Recovery

arXiv.org Machine Learning

This letter presents a novel Block Bayesian Hypothesis Testing Algorithm (Block-BHTA) for reconstructing block sparse signals with unknown block structures. The Block-BHTA comprises the detection and recovery of the supports, and the estimation of the amplitudes of the block sparse signal. The support detection and recovery is performed using a Bayesian hypothesis testing. Then, based on the detected and reconstructed supports, the nonzero amplitudes are estimated by linear MMSE. The effectiveness of Block-BHTA is demonstrated by numerical experiments.


On Monotonicity of the Optimal Transmission Policy in Cross-layer Adaptive m-QAM Modulation

arXiv.org Machine Learning

This paper considers a cross-layer adaptive modulation system that is modeled as a Markov decision process (MDP). We study how to utilize the monotonicity of the optimal transmission policy to relieve the computational complexity of dynamic programming (DP). In this system, a scheduler controls the bit rate of the m-quadrature amplitude modulation (m-QAM) in order to minimize the long-term losses incurred by the queue overflow in the data link layer and the transmission power consumption in the physical layer. The work is done in two steps. Firstly, we observe the L-natural-convexity and submodularity of DP to prove that the optimal policy is always nondecreasing in queue occupancy/state and derive the sufficient condition for it to be nondecreasing in both queue and channel states. We also show that, due to the L-natural-convexity of DP, the variation of the optimal policy in queue state is restricted by a bounded marginal effect: The increment of the optimal policy between adjacent queue states is no greater than one. Secondly, we use the monotonicity results to present two low complexity algorithms: monotonic policy iteration (MPI) based on L-natural-convexity and discrete simultaneous perturbation stochastic approximation (DSPSA). We run experiments to show that the time complexity of MPI based on L-natural-convexity is much lower than that of DP and the conventional MPI that is based on submodularity and DSPSA is able to adaptively track the optimal policy when the system parameters change.


Spatio-temporal Spike and Slab Priors for Multiple Measurement Vector Problems

arXiv.org Machine Learning

We are interested in solving the multiple measurement vector (MMV) problem for instances, where the underlying sparsity pattern exhibit spatio-temporal structure motivated by the electroencephalogram (EEG) source localization problem. We propose a probabilistic model that takes this structure into account by generalizing the structured spike and slab prior and the associated Expectation Propagation inference scheme. Based on numerical experiments, we demonstrate the viability of the model and the approximate inference scheme.


Listen, Attend and Spell

arXiv.org Machine Learning

We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters. This is the key improvement of LAS over previous end-to-end CTC models. On a subset of the Google voice search task, LAS achieves a word error rate (WER) of 14.1% without a dictionary or a language model, and 10.3% with language model rescoring over the top 32 beams. By comparison, the state-of-the-art CLDNN-HMM model achieves a WER of 8.0%.