Bayesian Inference
Variational Federated Multi-Task Learning
Corinzia, Luca, Buhmann, Joachim M.
In classical federated learning a central server coordinates the training of a single model on a massively distributed network of devices. This setting can be naturally extended to a multi-task learning framework, to handle real-world federated datasets that typically show strong non-IID data distributions among devices. Even though federated multi-task learning has been shown to be an effective paradigm for real world datasets, it has been applied only to convex models. In this work we introduce VIRTUAL, an algorithm for federated multi-task learning with non-convex models. In VIRTUAL the federated network of the server and the clients is treated as a star-shaped Bayesian network, and learning is performed on the network using approximated variational inference. We show that this method is effective on real-world federated datasets, outperforming the current state-of-the-art for federated learning.
Education In The Age Of Machine Learning Big Cloud Recruitment
Machine Learning, often abbreviated to ML, is a form of learning in which systems use complex computer algorithms to acquire knowledge or skill automatically without being programmed directly. It is considered as a type of AI (Artificial Intelligence) since machines are built with the idea to learn and make decisions from the available data and even improve themselves from experience without requiring human involvement. This is mainly used to maximize the machine's performance. The idea behind ML is based on mathematics, computer science, and statistics. Additionally, great scientists such as Andrey Markov, Thomas Bayes, and Carl Friedrich Gauss have contributed in the invention of statistical models like Markov Chains, Bayes Theorem, and the method of Least-Square respectively which are used a great deal in the Machine Learning algorithms.
Modeling the Dynamics of PDE Systems with Physics-Constrained Deep Auto-Regressive Networks
Geneva, Nicholas, Zabaras, Nicholas
In recent years, deep learning has proven to be a viable methodology for surrogate modeling and uncertainty quantification for a vast number of physical systems. However, in their traditional form, such models require a large amount of training data. This is of particular importance for various engineering and scientific applications where data may be extremely expensive to obtain. To overcome this shortcoming, physics-constrained deep learning provides a promising methodology as it only utilizes the governing equations. In this work, we propose a novel auto-regressive dense encoder-decoder convolutional neural network to solve and model transient systems with non-linear dynamics at a computational cost that is potentially magnitudes lower than standard numerical solvers. This model includes a Bayesian framework that allows for uncertainty quantification of the predicted quantities of interest at each time-step. We rigorously test this model on several non-linear transient partial differential equation systems including the turbulence of the Kuramoto-Sivashinsky equation, multi-shock formation and interaction with 1D Burgers' equation and 2D wave dynamics with coupled Burgers' equations. For each system, the predictive results and uncertainty are presented and discussed together with comparisons to the results obtained from traditional numerical analysis methods.
Selective prediction-set models with coverage guarantees
Feng, Jean, Sondhi, Arjun, Perry, Jessica, Simon, Noah
Though black-box predictors are state-of-the-art for many complex tasks, they often fail to properly quantify predictive uncertainty and may provide inappropriate predictions for unfamiliar data. Instead, we can learn more reliable models by letting them either output a prediction set or abstain when the uncertainty is high. We propose training these selective prediction-set models using an uncertainty-aware loss minimization framework, which unifies ideas from decision theory and robust maximum likelihood. Moreover, since black-box methods are not guaranteed to output well-calibrated prediction sets, we show how to calculate point estimates and confidence intervals for the true coverage of any selective prediction-set model, as well as a uniform mixture of K set models obtained from K-fold sample-splitting. When applied to predicting in-hospital mortality and length-of-stay for ICU patients, our model outperforms existing approaches on both in-sample and out-of-sample age groups, and our recalibration method provides accurate inference for prediction set coverage.
Statistical Inference for Generative Models with Maximum Mean Discrepancy
Briol, Francois-Xavier, Barp, Alessandro, Duncan, Andrew B., Girolami, Mark
While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation is cheap. The distance considered, maximum mean discrepancy (MMD), is defined through the embedding of probability measures into a reproducing kernel Hilbert space. We study the theoretical properties of these estimators, showing that they are consistent, asymptotically normal and robust to model misspecification. A main advantage of these estimators is the flexibility offered by the choice of kernel, which can be used to trade-off statistical efficiency and robustness. On the algorithmic side, we study the geometry induced by MMD on the parameter space and use this to introduce a novel natural gradient descent-like algorithm for efficient implementation of these estimators. We illustrate the relevance of our theoretical results on several classes of models including a discrete-time latent Markov process and two multivariate stochastic differential equation models.
Variance Estimation For Online Regression via Spectrum Thresholding
Kozdoba, Mark, Moroshko, Edward, Mannor, Shie, Crammer, Koby
We consider the online linear regression problem, where the predictor vector may vary with time. This problem can be modelled as a linear dynamical system, where the parameters that need to be learned are the variance of both the process noise and the observation noise. The classical approach to learning the variance is via the maximum likelihood estimator -- a non-convex optimization problem prone to local minima and with no finite sample complexity bounds. In this paper we study the global system operator: the operator that maps the noises vectors to the output. In particular, we obtain estimates on its spectrum, and as a result derive the first known variance estimators with sample complexity guarantees for online regression problems. We demonstrate the approach on a number of synthetic and real-world benchmarks.
MOPED: Efficient priors for scalable variational inference in Bayesian deep neural networks
Krishnan, Ranganath, Subedar, Mahesh, Tickoo, Omesh
Variational inference for Bayesian deep neural networks (DNNs) requires specifying priors and approximate posterior distributions for neural network weights. Specifying meaningful weight priors is a challenging problem, particularly for scaling variational inference to deeper architectures involving high dimensional weight space. We propose Bayesian MOdel Priors Extracted from Deterministic DNN (MOPED) method for stochastic variational inference to choose meaningful prior distributions over weight space using deterministic weights derived from the pretrained DNNs of equivalent architecture. We evaluate the proposed approach on multiple datasets and real-world application domains with a range of varying complex model architectures to demonstrate MOPED enables scalable variational inference for Bayesian DNNs. The proposed method achieves faster training convergence and provides reliable uncertainty quantification, without compromising on the accuracy provided by the deterministic DNNs. We also propose hybrid architectures to Bayesian DNNs where deterministic and variational layers are combined to balance computation complexity during prediction phase and while providing benefits of Bayesian inference. We will release the source code for this work.
Non-Parametric Calibration for Classification
Wenger, Jonathan, Kjellström, Hedvig, Triebel, Rudolph
Many applications for classification methods not only require high accuracy but also reliable estimation of predictive uncertainty. However, while many current classification frameworks, in particular deep neural network architectures, provide very good results in terms of accuracy, they tend to underestimate their predictive uncertainty. In this paper, we propose a method that corrects the confidence output of a general classifier such that it approaches the true probability of classifying correctly. This classifier calibration is, in contrast to existing approaches, based on a non-parametric representation using a latent Gaussian process and specifically designed for multi-class classification. It can be applied to any classification method that outputs confidence estimates and is not limited to neural networks. We also provide a theoretical analysis regarding the over- and underconfidence of a classifier and its relationship to calibration. In experiments we show the universally strong performance of our method across different classifiers and benchmark data sets in contrast to existing classifier calibration techniques.
Representation Learning for Words and Entities
This thesis presents new methods for unsupervised learning of distributed representations of words and entities from text and knowledge bases. The first algorithm presented in the thesis is a multi-view algorithm for learning representations of words called Multiview Latent Semantic Analysis (MVLSA). By incorporating up to 46 different types of co-occurrence statistics for the same vocabulary of english words, I show that MVLSA outperforms other state-of-the-art word embedding models. Next, I focus on learning entity representations for search and recommendation and present the second method of this thesis, Neural Variational Set Expansion (NVSE). NVSE is also an unsupervised learning method, but it is based on the Variational Autoencoder framework. Evaluations with human annotators show that NVSE can facilitate better search and recommendation of information gathered from noisy, automatic annotation of unstructured natural language corpora. Finally, I move from unstructured data and focus on structured knowledge graphs. I present novel approaches for learning embeddings of vertices and edges in a knowledge graph that obey logical constraints.
Random Tessellation Forests
Ge, Shufei, Wang, Shijia, Teh, Yee Whye, Wang, Liangliang, Elliott, Lloyd T.
Space partitioning methods such as random forests and the Mondrian process are powerful machine learning methods for multi-dimensional and relational data, and are based on recursively cutting a domain. The flexibility of these methods is often limited by the requirement that the cuts be axis aligned. The Ostomachion process and the self-consistent binary space partitioning-tree process were recently introduced as generalizations of the Mondrian process for space partitioning with non-axis aligned cuts in the two dimensional plane. Motivated by the need for a multi-dimensional partitioning tree with non-axis aligned cuts, we propose the Random Tessellation Process (RTP), a framework that includes the Mondrian process and the binary space partitioning-tree process as special cases. We derive a sequential Monte Carlo algorithm for inference, and provide random forest methods. Our process is self-consistent and can relax axis-aligned constraints, allowing complex inter-dimensional dependence to be captured. We present a simulation study, and analyse gene expression data of brain tissue, showing improved accuracies over other methods.