Haussmann, Manuel
Latent variable model for high-dimensional point process with structured missingness
Sinelnikov, Maksim, Haussmann, Manuel, Lähdesmäki, Harri
Longitudinal data are important in numerous fields, such as healthcare, sociology and seismology, but real-world datasets present notable challenges for practitioners because they can be high-dimensional, contain structured missingness patterns, and measurement time points can be governed by an unknown stochastic process. While various solutions have been suggested, the majority of them have been designed to account for only one of these challenges. In this work, we propose a flexible and efficient latent-variable model that is capable of addressing all these limitations. Our approach utilizes Gaussian processes to capture temporal correlations between samples and their associated missingness masks as well as to model the underlying point process. We construct our model as a variational autoencoder together with deep neural network parameterised encoder and decoder models, and develop a scalable amortised variational inference approach for efficient model training. We demonstrate competitive performance using both simulated and real datasets.
Estimating treatment effects from single-arm trials via latent-variable modeling
Haussmann, Manuel, Le, Tran Minh Son, Halla-aho, Viivi, Kurki, Samu, Leinonen, Jussi, Koskinen, Miika, Kaski, Samuel, Lähdesmäki, Harri
Randomized controlled trials (RCTs) are the accepted standard for treatment effect estimation but they can be infeasible due to ethical reasons and prohibitive costs. Single-arm trials, where all patients belong to the treatment group, can be a viable alternative but require access to an external control group. We propose an identifiable deep latent-variable model for this scenario that can also account for missing covariate observations by modeling their structured missingness patterns. Our method uses amortized variational inference to learn both group-specific and identifiable shared latent representations, which can subsequently be used for (i) patient matching if treatment outcomes are not available for the treatment group, or for (ii) direct treatment effect estimation assuming outcomes are available for both groups. We evaluate the model on a public benchmark as well as on a data set consisting of a published RCT study and real-world electronic health records. Compared to previous methods, our results show improved performance both for direct treatment effect estimation as well as for effect estimation via patient matching.
Practical Equivariances via Relational Conditional Neural Processes
Huang, Daolang, Haussmann, Manuel, Remes, Ulpu, John, ST, Clarté, Grégoire, Luck, Kevin Sebastian, Kaski, Samuel, Acerbi, Luigi
Conditional Neural Processes (CNPs) are a class of metalearning models popular for combining the runtime efficiency of amortized inference with reliable uncertainty quantification. Many relevant machine learning tasks, such as in spatio-temporal modeling, Bayesian Optimization and continuous control, inherently contain equivariances -- for example to translation -- which the model can exploit for maximal performance. However, prior attempts to include equivariances in CNPs do not scale effectively beyond two input dimensions. In this work, we propose Relational Conditional Neural Processes (RCNPs), an effective approach to incorporate equivariances into any neural process model. Our proposed method extends the applicability and impact of equivariant neural processes to higher dimensions. We empirically demonstrate the competitive performance of RCNPs on a large array of tasks naturally containing equivariances.
Learning Partially Known Stochastic Dynamics with Empirical PAC Bayes
Haussmann, Manuel, Gerwinn, Sebastian, Look, Andreas, Rakitsch, Barbara, Kandemir, Melih
In the following, we assume to have access to a differential equation system Neural Stochastic Differential Equations that describes the dynamics of the target environment model a dynamical environment with neural with low fidelity, e.g. by describing the vector field on nets assigned to their drift and diffusion a reduced dimensionality, by ignoring detailed models terms. The high expressive power of their of some system components, or by avoiding certain nonlinearity comes at the expense of instability dependencies for computational feasibility. We incorporate in the identification of the large set of free the ODE system provided by the domain expert parameters. This paper presents a recipe to into a nonlinear system identification engine, which we improve the prediction accuracy of such models choose to be a Bayesian Neural Stochastic Differential in three steps: i) accounting for epistemic Equation (BNSDE) to cover a large scope of dynamical uncertainty by assuming probabilistic weights, systems, resulting in a hybrid model.
Deep Active Learning with Adaptive Acquisition
Haussmann, Manuel, Hamprecht, Fred A., Kandemir, Melih
Model selection is treated as a standard performance boosting step in many machine learning applications. Once all other properties of a learning problem are fixed, the model is selected by grid search on a held-out validation set. This is strictly inapplicable to active learning. Within the standardized workflow, the acquisition function is chosen among available heuristics a priori, and its success is observed only after the labeling budget is already exhausted. More importantly, none of the earlier studies report a unique consistently successful acquisition heuristic to the extent to stand out as the unique best choice. We present a method to break this vicious circle by defining the acquisition function as a learning predictor and training it by reinforcement feedback collected from each labeling round. As active learning is a scarce data regime, we bootstrap from a well-known heuristic that filters the bulk of data points on which all heuristics would agree, and learn a policy to warp the top portion of this ranking in the most beneficial way for the character of a specific data distribution. Our system consists of a Bayesian neural net, the predictor, a bootstrap acquisition function, a probabilistic state definition, and another Bayesian policy network that can effectively incorporate this input distribution. We observe on three benchmark data sets that our method always manages to either invent a new superior acquisition function or to adapt itself to the a priori unknown best performing heuristic for each specific data set.
Sampling-Free Variational Inference of Bayesian Neural Networks by Variance Backpropagation
Haussmann, Manuel, Hamprecht, Fred A., Kandemir, Melih
We propose a new Bayesian Neural Net formulation that affords variational inference for which the evidence lower bound is analytically tractable subject to a tight approximation. We achieve this tractability by (i) decomposing ReLU nonlinearities into the product of an identity and a Heaviside step function, (ii) introducing a separate path that decomposes the neural net expectation from its variance. We demonstrate formally that introducing separate latent binary variables to the activations allows representing the neural network likelihood as a chain of linear operations. Performing variational inference on this construction enables a sampling-free computation of the evidence lower bound which is a more effective approximation than the widely applied Monte Carlo sampling and CLT related techniques. We evaluate the model on a range of regression and classification tasks against BNN inference alternatives, showing competitive or improved performance over the current state-of-the-art.
Bayesian Prior Networks with PAC Training
Haussmann, Manuel, Gerwinn, Sebastian, Kandemir, Melih
We propose to train Bayesian Neural Networks (BNNs) by empirical Bayes as an alternative to posterior weight inference. By approximately marginalizing out an i.i.d.\ realization of a finite number of sibling weights per data-point using the Central Limit Theorem (CLT), we attain a scalable and effective Bayesian deep predictor. This approach directly models the posterior predictive distribution, by-passing the intractable posterior weight inference step. However, it introduces a prohibitively large number of hyperparameters for stable training. As the prior weights are marginalized and hyperparameters are optimized, the model also no longer provides a means to incorporate prior knowledge. We overcome both of these drawbacks by deriving a trivial PAC bound that comprises the marginal likelihood of the predictor and a complexity penalty. The outcome integrates organically into the prior networks framework, bringing about an effective and holistic Bayesian treatment of prediction uncertainty. We observe on various regression, classification, and out-of-domain detection benchmarks that our scalable method provides an improved model fit accompanied with significantly better uncertainty estimates than the state-of-the-art.