Goto

Collaborating Authors

 Bayesian Learning


Decentralized Online Ensembles of Gaussian Processes for Multi-Agent Systems

arXiv.org Machine Learning

Flexible and scalable decentralized learning solutions are fundamentally important in the application of multi-agent systems. While several recent approaches introduce (ensembles of) kernel machines in the distributed setting, Bayesian solutions are much more limited. We introduce a fully decentralized, asymptotically exact solution to computing the random feature approximation of Gaussian processes. We further address the choice of hyperparameters by introducing an ensembling scheme for Bayesian multiple kernel learning based on online Bayesian model averaging. The resulting algorithm is tested against Bayesian and frequentist methods on simulated and real-world datasets.


Deep Learning Models for Physical Layer Communications

arXiv.org Artificial Intelligence

The increased availability of data and computing resources has enabled researchers to successfully adopt machine learning (ML) techniques and make significant contributions in several engineering areas. ML and in particular deep learning (DL) algorithms have shown to perform better in tasks where a physical bottom-up description of the phenomenon is lacking and/or is mathematically intractable. Indeed, they take advantage of the observations of natural phenomena to automatically acquire knowledge and learn internal relations. Despite the historical model-based mindset, communications engineering recently started shifting the focus towards top-down data-driven learning models, especially in domains such as channel modeling and physical layer design, where in most of the cases no general optimal strategies are known. In this thesis, we aim at solving some fundamental open challenges in physical layer communications exploiting new DL paradigms. In particular, we mathematically formulate, under ML terms, classic problems such as channel capacity and optimal coding-decoding schemes, for any arbitrary communication medium. We design and develop the architecture, algorithm and code necessary to train the equivalent DL model, and finally, we propose novel solutions to long-standing problems in the field.


Time Series Analysis of Rankings: A GARCH-Type Approach

arXiv.org Machine Learning

Ranking data are frequently obtained nowadays but there are still scarce methods for treating these data when temporally observed. The present paper contributes to this topic by proposing and developing novel models for handling time series of ranking data. We introduce a class of time-varying ranking models inspired by the Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) models. More specifically, the temporal dynamics are defined by the conditional distribution of the current ranking given the past rankings, which are assumed to follow a Mallows distribution, which implicitly depends on a distance. Then, autoregressive and feedback components are incorporated into the model through the conditional expectation of the associated distances. Theoretical properties of our ranking GARCH models such as stationarity and ergodicity are established. The estimation of parameters is performed via maximum likelihood estimation when data is fully observed. We develop a Monte Carlo Expectation-Maximisation algorithm to deal with cases involving missing data. Monte Carlo simulation studies are presented to study the performance of the proposed estimators under both non-missing and missing data scenarios. A real data application about the weekly ranking of professional tennis players from 2015 to 2019 is presented under our proposed ranking GARCH models.


Efficient distributional regression trees learning algorithms for calibrated non-parametric probabilistic forecasts

arXiv.org Artificial Intelligence

The perspective of developing trustworthy AI for critical applications in science and engineering requires machine learning techniques that are capable of estimating their own uncertainty. In the context of regression, instead of estimating a conditional mean, this can be achieved by producing a predictive interval for the output, or to even learn a model of the conditional probability $p(y|x)$ of an output $y$ given input features $x$. While this can be done under parametric assumptions with, e.g. generalized linear model, these are typically too strong, and non-parametric models offer flexible alternatives. In particular, for scalar outputs, learning directly a model of the conditional cumulative distribution function of $y$ given $x$ can lead to more precise probabilistic estimates, and the use of proper scoring rules such as the weighted interval score (WIS) and the continuous ranked probability score (CRPS) lead to better coverage and calibration properties. This paper introduces novel algorithms for learning probabilistic regression trees for the WIS or CRPS loss functions. These algorithms are made computationally efficient thanks to an appropriate use of known data structures - namely min-max heaps, weight-balanced binary trees and Fenwick trees. Through numerical experiments, we demonstrate that the performance of our methods is competitive with alternative approaches. Additionally, our methods benefit from the inherent interpretability and explainability of trees. As a by-product, we show how our trees can be used in the context of conformal prediction and explain why they are particularly well-suited for achieving group-conditional coverage guarantees.


PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders

arXiv.org Machine Learning

Learning informative representations of phylogenetic tree structures is essential for analyzing evolutionary relationships. Classical distance-based methods have been widely used to project phylogenetic trees into Euclidean space, but they are often sensitive to the choice of distance metric and may lack sufficient resolution. In this paper, we introduce phylogenetic variational autoencoders (PhyloVAEs), an unsupervised learning framework designed for representation learning and generative modeling of tree topologies. Leveraging an efficient encoding mechanism inspired by autoregressive tree topology generation, we develop a deep latent-variable generative model that facilitates fast, parallelized topology generation. Phylo-VAE combines this generative model with a collaborative inference model based on learnable topological features, allowing for high-resolution representations of phylogenetic tree samples. Extensive experiments demonstrate PhyloVAE's robust representation learning capabilities and fast generation of phylogenetic tree topologies. Phylogenetic trees are the foundational structure for describing the evolutionary processes among individuals or groups of biological entities. Reconstructing these trees based on collected biological sequences (e.g., DNA, RNA, protein) from observed species, also known as phylogenetic inference (Felsenstein, 2004), is an essential discipline of computational biology (Fitch, 1971; Felsenstein, 1981; Yang & Rannala, 1997; Ronquist et al., 2012). Large collections of trees obtained from these approaches (e.g., posterior samples from MCMC runs (Ronquist et al., 2012)), however, are often difficult to summarize or visualize due to the discrete and non-Euclidean nature of the tree topology space The classical approach to visualize and analyze distributions of phylogenetic trees is to calculate pairwise distances between the trees and project them into a plane using multidimensional scaling (MDS) (Amenta & Klingner, 2002; Hillis et al., 2005; Jombart et al., 2017). However, these approaches have the shortcoming that one can not map an arbitrary point in the visualization to a tree, and therefore do not form an actual visualization of the relevant tree space.


Active Learning of Model Discrepancy with Bayesian Experimental Design

arXiv.org Artificial Intelligence

Digital twins have been actively explored in many engineering applications, such as manufacturing and autonomous systems. However, model discrepancy is ubiquitous in most digital twin models and has significant impacts on the performance of using those models. In recent years, data-driven modeling techniques have been demonstrated promising in characterizing the model discrepancy in existing models, while the training data for the learning of model discrepancy is often obtained in an empirical way and an active approach of gathering informative data can potentially benefit the learning of model discrepancy. On the other hand, Bayesian experimental design (BED) provides a systematic approach to gathering the most informative data, but its performance is often negatively impacted by the model discrepancy. In this work, we build on sequential BED and propose an efficient approach to iteratively learn the model discrepancy based on the data from the BED. The performance of the proposed method is validated by a classical numerical example governed by a convection-diffusion equation, for which full BED is still feasible. The proposed method is then further studied in the same numerical example with a high-dimensional model discrepancy, which serves as a demonstration for the scenarios where full BED is not practical anymore. An ensemble-based approximation of information gain is further utilized to assess the data informativeness and to enhance learning model discrepancy. The results show that the proposed method is efficient and robust to the active learning of high-dimensional model discrepancy, using data suggested by the sequential BED. We also demonstrate that the proposed method is compatible with both classical numerical solvers and modern auto-differentiable solvers.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

Rebuttal: thank you for your clarifications. I still think that learning kernel (parameters) from multiple realizations of a GP is not very novel in general, but sufficiently novel in your specific context to get discussed at NIPS. The authors use Gaussian processes to learn human function extrapolation behaviour from human sample data. After a comprehensive literature review, they introduce the main idea of the paper: learn the kernel parameters by maximizing the conditional probability of the extrapolation data given the training data. To allow for flexible kernel shapes, they use spectral mixture kernels.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

The paper describes tricks to scale Bayesian network structure learning to thousands of variables. This is achieved by developing new heuristics for candidate parent set identification and the subsequent order based structure optimization. In general, the paper is clearly written and easy to read. There are issues in editing and style, but the problems do not affect readability (much). The suggested heuristics feel bit ad-hoc, thus the value of the work is eventually judged by empirical evaluation.


Review for NeurIPS paper: Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

Neural Information Processing Systems

Summary and Contributions: The paper considers the problem of sampling from the posterior distribution in Bayesian inference. To be more precise, the paper approaches the question of stochastic sampling that relies only on minibatches of data at each iteration. To achieve rapid mixing between isolated modes, the authors consider parallel tempered chains and introduce replica-exchange steps into the stochastic Nose-Hoover Dynamics. The crux of this approach is the stochastic test for the replica-exchange step. To develop such a test, the authors follow the paper [An efficient minibatch acceptance test for metropolis-hastings], which introduces the concept of correction distribution.


Review for NeurIPS paper: Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets

Neural Information Processing Systems

The paper proposes a novel MCMC-type algorithm to perform Bayesian inference on large datasets. The paper is a mixture of replica exchange, Nose-Hoover dynamics and non-standard acceptance criterion to deal with mini-batches. All the reviewers participated actively to the discussion after the rebuttal was made available. Although all the ingredients of the proposed method do exist, their combination is original and potentially useful for the ML literature as pointed out by most reviewers. Theorem 2 is also neat and proposes a nice way to propose swaps between replicas using mini-batches.