Bayesian Inference
Wasserstein PAC-Bayes Learning: Exploiting Optimisation Guarantees to Explain Generalisation
Haddouche, Maxime, Guedj, Benjamin
PAC-Bayes learning is an established framework to both assess the generalisation ability of learning algorithms, and design new learning algorithm by exploiting generalisation bounds as training objectives. Most of the exisiting bounds involve a \emph{Kullback-Leibler} (KL) divergence, which fails to capture the geometric properties of the loss function which are often useful in optimisation. We address this by extending the emerging \emph{Wasserstein PAC-Bayes} theory. We develop new PAC-Bayes bounds with Wasserstein distances replacing the usual KL, and demonstrate that sound optimisation guarantees translate to good generalisation abilities. In particular we provide generalisation bounds for the \emph{Bures-Wasserstein SGD} by exploiting its optimisation properties.
Real-time gravitational-wave science with neural posterior estimation
Dax, Maximilian, Green, Stephen R., Gair, Jonathan, Macke, Jakob H., Buonanno, Alessandra, Schรถlkopf, Bernhard
We demonstrate unprecedented accuracy for rapid gravitational-wave parameter estimation with deep learning. Using neural networks as surrogates for Bayesian posterior distributions, we analyze eight gravitational-wave events from the first LIGO-Virgo Gravitational-Wave Transient Catalog and find very close quantitative agreement with standard inference codes, but with inference times reduced from O(day) to a minute per event. Our networks are trained using simulated data, including an estimate of the detector-noise characteristics near the event. This encodes the signal and noise models within millions of neural-network parameters, and enables inference for any observed data consistent with the training distribution, accounting for noise nonstationarity from event to event. Our algorithm -- called "DINGO" -- sets a new standard in fast-and-accurate inference of physical parameters of detected gravitational-wave events, which should enable real-time data analysis without sacrificing accuracy.
Utilizing Social Media Attributes for Enhanced Keyword Detection: An IDF-LDA Model Applied to Sina Weibo
With the rapid development of social media such as Twitter and Weibo, detecting keywords from a huge volume of text data streams in real-time has become a critical problem. The keyword detection problem aims at searching important information from massive text data to reflect the most important events or topics. However, social media data usually has unique features: the documents are usually short, the language is colloquial, and the data is likely to have significant temporal patterns. Therefore, it could be challenging to discover critical information from these text streams. In this paper, we propose a novel method to address the keyword detection problem in social media. Our model combines the Inverse Document Frequency (IDF) and Latent Dirichlet Allocation (LDA) models to better cope with the distinct attributes of social media data, such as the number of likes, comments, and retweets. By weighting the importance of each document based on these attributes, our method can effectively detect more representative keywords over time. Comprehensive experiments conducted under various conditions on Weibo data illustrate that our approach outperforms the baselines in various evaluation metrics, including precision and recall for multiple problem settings.
Compositional diversity in visual concept learning
Zhou, Yanli, Feinman, Reuben, Lake, Brenden M.
Humans leverage compositionality to efficiently learn new concepts, understanding how familiar parts can combine together to form novel objects. In contrast, popular computer vision models struggle to make the same types of inferences, requiring more data and generalizing less flexibly than people do. Here, we study these distinctively human abilities across a range of different types of visual composition, examining how people classify and generate ``alien figures'' with rich relational structure. We also develop a Bayesian program induction model which searches for the best programs for generating the candidate visual figures, utilizing a large program space containing different compositional mechanisms and abstractions. In few shot classification tasks, we find that people and the program induction model can make a range of meaningful compositional generalizations, with the model providing a strong account of the experimental data as well as interpretable parameters that reveal human assumptions about the factors invariant to category membership (here, to rotation and changing part attachment). In few shot generation tasks, both people and the models are able to construct compelling novel examples, with people behaving in additional structured ways beyond the model capabilities, e.g. making choices that complete a set or reconfiguring existing parts in highly novel ways. To capture these additional behavioral patterns, we develop an alternative model based on neuro-symbolic program induction: this model also composes new concepts from existing parts yet, distinctively, it utilizes neural network modules to successfully capture residual statistical structure. Together, our behavioral and computational findings show how people and models can produce a rich variety of compositional behavior when classifying and generating visual objects.
Class Conditional Gaussians for Continual Learning
Dealing with representation shift is one of the main problems in online continual learning. Current methods mainly solve this by reducing representation shift, but leave the classifier on top of the representation to slowly adapt, in many update steps, to the remaining representation shift, increasing forgetting. We propose DeepCCG, an empirical Bayesian approach to solve this problem. DeepCCG works by updating the posterior of a class conditional Gaussian classifier such that the classifier adapts instantly to representation shift. The use of a class conditional Gaussian classifier also enables DeepCCG to use a log conditional marginal likelihood loss to update the representation, which can be seen as a new type of replay. To perform the update to the classifier and representation, DeepCCG maintains a fixed number of examples in memory and so a key part of DeepCCG is selecting what examples to store, choosing the subset that minimises the KL divergence between the true posterior and the posterior induced by the subset. We demonstrate the performance of DeepCCG on a range of settings, including those with overlapping tasks which thus far have been under-explored. In the experiments, DeepCCG outperforms all other methods, evidencing its potential.
Safety of autonomous vehicles: A survey on Model-based vs. AI-based approaches
Iberraken, Dimia, Adouane, Lounis
The growing advancements in Autonomous Vehicles (AVs) have emphasized the critical need to prioritize the absolute safety of AV maneuvers, especially in dynamic and unpredictable environments or situations. This objective becomes even more challenging due to the uniqueness of every traffic situation/condition. To cope with all these very constrained and complex configurations, AVs must have appropriate control architectures with reliable and real-time Risk Assessment and Management Strategies (RAMS). These targeted RAMS must lead to reduce drastically the navigation risks. However, the lack of safety guarantees proves, which is one of the key challenges to be addressed, limit drastically the ambition to introduce more broadly AVs on our roads and restrict the use of AVs to very limited use cases. Therefore, the focus and the ambition of this paper is to survey research on autonomous vehicles while focusing on the important topic of safety guarantee of AVs. For this purpose, it is proposed to review research on relevant methods and concepts defining an overall control architecture for AVs, with an emphasis on the safety assessment and decision-making systems composing these architectures. Moreover, it is intended through this reviewing process to highlight researches that use either model-based methods or AI-based approaches. This is performed while emphasizing the strengths and weaknesses of each methodology and investigating the research that proposes a comprehensive multi-modal design that combines model-based and AI approaches. This paper ends with discussions on the methods used to guarantee the safety of AVs namely: safety verification techniques and the standardization/generalization of safety frameworks.
A Bayesian sparse factor model with adaptive posterior concentration
Ohn, Ilsang, Lin, Lizhen, Kim, Yongdai
In this paper, we propose a new Bayesian inference method for a high-dimensional sparse factor model that allows both the factor dimensionality and the sparse structure of the loading matrix to be inferred. The novelty is to introduce a certain dependence between the sparsity level and the factor dimensionality, which leads to adaptive posterior concentration while keeping computational tractability. We show that the posterior distribution asymptotically concentrates on the true factor dimensionality, and more importantly, this posterior consistency is adaptive to the sparsity level of the true loading matrix and the noise variance. We also prove that the proposed Bayesian model attains the optimal detection rate of the factor dimensionality in a more general situation than those found in the literature. Moreover, we obtain a near-optimal posterior concentration rate of the covariance matrix. Numerical studies are conducted and show the superiority of the proposed method compared with other competitors.
Exhaustive Symbolic Regression
Bartlett, Deaglan J., Desmond, Harry, Ferreira, Pedro G.
Symbolic Regression (SR) algorithms attempt to learn analytic expressions which fit data accurately and in a highly interpretable manner. Conventional SR suffers from two fundamental issues which we address here. First, these methods search the space stochastically (typically using genetic programming) and hence do not necessarily find the best function. Second, the criteria used to select the equation optimally balancing accuracy with simplicity have been variable and subjective. To address these issues we introduce Exhaustive Symbolic Regression (ESR), which systematically and efficiently considers all possible equations -- made with a given basis set of operators and up to a specified maximum complexity -- and is therefore guaranteed to find the true optimum (if parameters are perfectly optimised) and a complete function ranking subject to these constraints. We implement the minimum description length principle as a rigorous method for combining these preferences into a single objective. To illustrate the power of ESR we apply it to a catalogue of cosmic chronometers and the Pantheon+ sample of supernovae to learn the Hubble rate as a function of redshift, finding $\sim$40 functions (out of 5.2 million trial functions) that fit the data more economically than the Friedmann equation. These low-redshift data therefore do not uniquely prefer the expansion history of the standard model of cosmology. We make our code and full equation sets publicly available.
Graph Neural Processes for Spatio-Temporal Extrapolation
Hu, Junfeng, Liang, Yuxuan, Fan, Zhencheng, Chen, Hongyang, Zheng, Yu, Zimmermann, Roger
We study the task of spatio-temporal extrapolation that generates data at target locations from surrounding contexts in a graph. This task is crucial as sensors that collect data are sparsely deployed, resulting in a lack of fine-grained information due to high deployment and maintenance costs. Existing methods either use learning-based models like Neural Networks or statistical approaches like Gaussian Processes for this task. However, the former lacks uncertainty estimates and the latter fails to capture complex spatial and temporal correlations effectively. To address these issues, we propose Spatio-Temporal Graph Neural Processes (STGNP), a neural latent variable model which commands these capabilities simultaneously. Specifically, we first learn deterministic spatio-temporal representations by stacking layers of causal convolutions and cross-set graph neural networks. Then, we learn latent variables for target locations through vertical latent state transitions along layers and obtain extrapolations. Importantly during the transitions, we propose Graph Bayesian Aggregation (GBA), a Bayesian graph aggregator that aggregates contexts considering uncertainties in context data and graph structure. Extensive experiments show that STGNP has desirable properties such as uncertainty estimates and strong learning capabilities, and achieves state-of-the-art results by a clear margin.
Predicting Rare Events by Shrinking Towards Proportional Odds
Training classifiers is difficult with severe class imbalance, but many rare events are the culmination of a sequence with much more common intermediate outcomes. For example, in online marketing a user first sees an ad, then may click on it, and finally may make a purchase; estimating the probability of purchases is difficult because of their rarity. We show both theoretically and through data experiments that the more abundant data in earlier steps may be leveraged to improve estimation of probabilities of rare events. We present PRESTO, a relaxation of the proportional odds model for ordinal regression. Instead of estimating weights for one separating hyperplane that is shifted by separate intercepts for each of the estimated Bayes decision boundaries between adjacent pairs of categorical responses, we estimate separate weights for each of these transitions. We impose an L1 penalty on the differences between weights for the same feature in adjacent weight vectors in order to shrink towards the proportional odds model. We prove that PRESTO consistently estimates the decision boundary weights under a sparsity assumption. Synthetic and real data experiments show that our method can estimate rare probabilities in this setting better than both logistic regression on the rare category, which fails to borrow strength from more abundant categories, and the proportional odds model, which is too inflexible.