Bayesian Learning
Machine Learning with Operational Costs
Tulabandhula, Theja, Rudin, Cynthia
This work proposes a way to align statistical modeling with decision making. We provide a method that propagates the uncertainty in predictive modeling to the uncertainty in operational cost, where operational cost is the amount spent by the practitioner in solving the problem. The method allows us to explore the range of operational costs associated with the set of reasonable statistical models, so as to provide a useful way for practitioners to understand uncertainty. To do this, the operational cost is cast as a regularization term in a learning algorithm's objective function, allowing either an optimistic or pessimistic view of possible costs, depending on the regularization parameter. From another perspective, if we have prior knowledge about the operational cost, for instance that it should be low, this knowledge can help to restrict the hypothesis space, and can help with generalization. We provide a theoretical generalization bound for this scenario. We also show that learning with operational costs is related to robust optimization.
Group Symmetry and non-Gaussian Covariance Estimation
Soloveychik, Ilya, Wiesel, Ami
We consider robust covariance estimation with group symmetry constraints. Non-Gaussian covariance estimation, e.g., Tyler scatter estimator and Multivariate Generalized Gaussian distribution methods, usually involve non-convex minimization problems. Recently, it was shown that the underlying principle behind their success is an extended form of convexity over the geodesics in the manifold of positive definite matrices. A modern approach to improve estimation accuracy is to exploit prior knowledge via additional constraints, e.g., restricting the attention to specific classes of covariances which adhere to prior symmetry structures. In this paper, we prove that such group symmetry constraints are also geodesically convex and can therefore be incorporated into various non-Gaussian covariance estimators. Practical examples of such sets include: circulant, persymmetric and complex/quaternion proper structures. We provide a simple numerical technique for finding maximum likelihood estimates under such constraints, and demonstrate their performance advantage using synthetic experiments.
Bayesian test of significance for conditional independence: The multinomial model
Andrade, Pablo de Morais, Stern, Julio Michael, Pereira, Carlos Alberto de Braganรงa
Conditional independence tests (CI tests) have received special attention lately in Machine Learning and Computational Intelligence related literature as an important indicator of the relationship among the variables used by their models. In the field of Probabilistic Graphical Models (PGM)--which includes Bayesian Networks (BN) models--CI tests are especially important for the task of learning the PGM structure from data. In this paper, we propose the Full Bayesian Significance Test (FBST) for tests of conditional independence for discrete datasets. FBST is a powerful Bayesian test for precise hypothesis, as an alternative to frequentist's significance tests (characterized by the calculation of the \emph{p-value}).
Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation
Hsieh, Cho-Jui, Sustik, Matyas A., Dhillon, Inderjit S., Ravikumar, Pradeep
The L1-regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm for solving the resulting optimization problem which is a regularized log-determinant program. In contrast to recent state-of-the-art methods that largely use first order gradient information, our algorithm is based on Newton's method and employs a quadratic approximation, but with some modifications that leverage the structure of the sparse Gaussian MLE problem. We show that our method is superlinearly convergent, and present experimental results using synthetic and real-world application data that demonstrate the considerable improvements in performance of our method when compared to other state-of-the-art methods.
A Greedy Approximation of Bayesian Reinforcement Learning with Probably Optimistic Transition Model
Kawaguchi, Kenji, Araya, Mauricio
Bayesian Reinforcement Learning (RL) is capable of not only incorporating domain knowledge, but also solving the exploration-exploitation dilemma in a natural way. As Bayesian RL is intractable except for special cases, previous work has proposed several approximation methods. However, these methods are usually too sensitive to parameter values, and finding an acceptable parameter setting is practically impossible in many applications. In this paper, we propose a new algorithm that greedily approximates Bayesian RL to achieve robustness in parameter space. We show that for a desired learning behavior, our proposed algorithm has a polynomial sample complexity that is lower than those of existing algorithms. We also demonstrate that the proposed algorithm naturally outperforms other existing algorithms when the prior distributions are not significantly misleading. On the other hand, the proposed algorithm cannot handle greatly misspecified priors as well as the other algorithms can. This is a natural consequence of the fact that the proposed algorithm is greedier than the other algorithms. Accordingly, we discuss a way to select an appropriate algorithm for different tasks based on the algorithms' greediness. We also introduce a new way of simplifying Bayesian planning, based on which future work would be able to derive new algorithms.
Hybrid Maximum Likelihood Modulation Classification Using Multiple Radios
Ozdemir, Onur, Li, Ruoyu, Varshney, Pramod K.
The performance of a modulation classifier is highly sensitive to channel signal-to-noise ratio (SNR). In this paper, we focus on amplitude-phase modulations and propose a modulation classification framework based on centralized data fusion using multiple radios and the hybrid maximum likelihood (ML) approach. In order to alleviate the computational complexity associated with ML estimation, we adopt the Expectation Maximization (EM) algorithm. Due to SNR diversity, the proposed multi-radio framework provides robustness to channel SNR. Numerical results show the superiority of the proposed approach with respect to single radio approaches as well as to modulation classifiers using moments based estimators.
A Factor Graph Approach to Joint OFDM Channel Estimation and Decoding in Impulsive Noise Environments
Nassar, Marcel, Schniter, Philip, Evans, Brian L.
We propose a novel receiver for orthogonal frequency division multiplexing (OFDM) transmissions in impulsive noise environments. Impulsive noise arises in many modern wireless and wireline communication systems, such as Wi-Fi and powerline communications, due to uncoordinated interference that is much stronger than thermal noise. We first show that the bit-error-rate optimal receiver jointly estimates the propagation channel coefficients, the noise impulses, the finite-alphabet symbols, and the unknown bits. We then propose a near-optimal yet computationally tractable approach to this joint estimation problem using loopy belief propagation. In particular, we merge the recently proposed "generalized approximate message passing" (GAMP) algorithm with the forward-backward algorithm and soft-input soft-output decoding using a "turbo" approach. Numerical results indicate that the proposed receiver drastically outperforms existing receivers under impulsive noise and comes within 1 dB of the matched-filter bound. Meanwhile, with N tones, the proposed factor-graph-based receiver has only O(N log N) complexity, and it can be parallelized.
Fast Dual Variational Inference for Non-Conjugate LGMs
Khan, Mohammad Emtiyaz, Aravkin, Aleksandr Y., Friedlander, Michael P., Seeger, Matthias
Latent Gaussian models (LGMs) are widely used in statistics and machine learning. Bayesian inference in non-conjugate LGMs is difficult due to intractable integrals involving the Gaussian prior and non-conjugate likelihoods. Algorithms based on variational Gaussian (VG) approximations are widely employed since they strike a favorable balance between accuracy, generality, speed, and ease of use. However, the structure of the optimization problems associated with these approximations remains poorly understood, and standard solvers take too long to converge. We derive a novel dual variational inference approach that exploits the convexity property of the VG approximations. We obtain an algorithm that solves a convex optimization problem, reduces the number of variational parameters, and converges much faster than previous methods. Using real-world data, we demonstrate these advantages on a variety of LGMs, including Gaussian process classification, and latent Gaussian Markov random fields.
Fast Gradient-Based Inference with Continuous Latent Variable Models in Auxiliary Form
We propose a technique for increasing the efficiency of gradient-based inference and learning in Bayesian networks with multiple layers of continuous latent vari- ables. We show that, in many cases, it is possible to express such models in an auxiliary form, where continuous latent variables are conditionally deterministic given their parents and a set of independent auxiliary variables. Variables of mod- els in this auxiliary form have much larger Markov blankets, leading to significant speedups in gradient-based inference, e.g. rapid mixing Hybrid Monte Carlo and efficient gradient-based optimization. The relative efficiency is confirmed in ex- periments.
Declarative Modeling and Bayesian Inference of Dark Matter Halos
Probabilistic programming allows specification of probabilistic models in a declarative manner. Recently, several new software systems and languages for probabilistic programming have been developed on the basis of newly developed and improved methods for approximate inference in probabilistic models. In this contribution a probabilistic model for an idealized dark matter localization problem is described. We first derive the probabilistic model for the inference of dark matter locations and masses, and then show how this model can be implemented using BUGS and Infer.NET, two software systems for probabilistic programming. Finally, the different capabilities of both systems are discussed. The presented dark matter model includes mainly non-conjugate factors, thus, it is difficult to implement this model with Infer.NET.