Bayesian Learning
Perception Compressor:A training-free prompt compression method in long context scenarios
Tang, Jiwei, Xu, Jin, Lu, Tingwei, Zhang, Zhicheng, Zhao, Yiming, Hai, Lin, Zheng, Hai-Tao
Large Language Models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and are sensitive to the position of key information (relevant to the input question) in long context scenarios, leading to inferior performance. To address these challenges, we present Perception Compressor, a training-free prompt compression method. It includes a perception retriever that leverages guiding questions and instruction to retrieve the most relevant demonstrations, a dual-slope ratio allocator to dynamically allocate compression ratios and open-book ratios, and a semi-guided iterative compression that retains key information at the token level while removing tokens that distract the LLM. We conduct extensive experiments on long context benchmarks, i.e., NaturalQuestions, LongBench, and MuSiQue. Experiment results show that Perception Compressor outperforms existing methods by a large margin, achieving state-of-the-art performance.
Proxy-informed Bayesian transfer learning with unknown sources
Sloman, Sabina J., Martinelli, Julien, Kaski, Samuel
Generalization outside the scope of one's training data requires leveraging prior knowledge about the effects that transfer, and the effects that don't, between different data sources. Bayesian transfer learning is a principled paradigm for specifying this knowledge, and refining it on the basis of data from the source (training) and target (prediction) tasks. We address the challenging transfer learning setting where the learner (i) cannot fine-tune in the target task, and (ii) does not know which source data points correspond to the same task (i.e., the data sources are unknown). We propose a proxy-informed robust method for probabilistic transfer learning (PROMPT), which provides a posterior predictive estimate tailored to the structure of the target task, without requiring the learner have access to any outcome information from the target task. Instead, PROMPT relies on the availability of proxy information. PROMPT uses the same proxy information for two purposes: (i) estimation of effects specific to the target task, and (ii) construction of a robust reweighting of the source data for estimation of effects that transfer between tasks. We provide theoretical results on the effect of this reweighting on the risk of negative transfer, and demonstrate application of PROMPT in two synthetic settings.
Graph Agnostic Causal Bayesian Optimisation
Mukherjee, Sumantrak, Zhang, Mengyan, Flaxman, Seth, Vollmer, Sebastian Josef
We study the problem of globally optimising a target variable of an unknown causal graph on which a sequence of soft or hard interventions can be performed. The problem of optimising the target variable associated with a causal graph is formalised as Causal Bayesian Optimisation (CBO). We study the CBO problem under the cumulative regret objective with unknown causal graphs for two settings, namely structural causal models with hard interventions and function networks with soft interventions. We propose Graph Agnostic Causal Bayesian Optimisation (GACBO), an algorithm that actively discovers the causal structure that contributes to achieving optimal rewards. GACBO seeks to balance exploiting the actions that give the best rewards against exploring the causal structures and functions. To the best of our knowledge, our work is the first to study causal Bayesian optimization with cumulative regret objectives in scenarios where the graph is unknown or partially known. We show our proposed algorithm outperforms baselines in simulated experiments and real-world applications.
Your copula is a classifier in disguise: classification-based copula density estimation
Huk, David, Steel, Mark, Dutta, Ritabrata
We propose reinterpreting copula density estimation as a discriminative task. Under this novel estimation scheme, we train a classifier to distinguish samples from the joint density from those of the product of independent marginals, recovering the copula density in the process. We derive equivalences between well-known copula classes and classification problems naturally arising in our interpretation. Furthermore, we show our estimator achieves theoretical guarantees akin to maximum likelihood estimation. By identifying a connection with density ratio estimation, we benefit from the rich literature and models available for such problems. Empirically, we demonstrate the applicability of our approach by estimating copulas of real and high-dimensional datasets, outperforming competing copula estimators in density evaluation as well as sampling.
First observations of the seiche that shook the world
Monahan, Thomas, Tang, Tianning, Roberts, Stephen, Adcock, Thomas A. A.
Extreme events are evolving as a direct consequence of climate change, leading to the emergence of new, previously unobserved phenomena [1, 2]. In remote regions like the Arctic, where in-situ measurements are sparse, scientists must increasingly depend on analytical and numerical models to explore these events. However, modeling in such regions presents significant challenges due to the uncertainties in the data required to calibrate and validate these models [3]. Consequently, large simplifications are often necessary, resulting in substantial discrepancies between observed and modeled phenomena. The mysterious 10.88 mHz very-long-period (VLP) seismic signal, which appeared following a tsunamigenic landslide in the Dickson Fjord, Greenland, on September 16th, 2023, and the subsequent interdisciplinary scientific efforts to determine its origin, underscore these challenges. Two independent studies [4, 5] have hypothesized that the signal was driven by a standing wave, or seiche, which formed in the aftermath of the tsunami. While it is well-documented that seiches can form in resonant enclosed and semi-enclosed basins [6], the loading-induced tilt they produce has only been observed locally (< 30 km) and for short durations (< 1 hour)[5, 7]. Moreover, no prior evidence exists of persistent fluid sloshing (lasting several days) without an external driver.
Tabular Data Synthesis with Differential Privacy: A Survey
Yang, Mengmeng, Chi, Chi-Hung, Lam, Kwok-Yan, Feng, Jie, Guo, Taolin, Ni, Wei
Data sharing is a prerequisite for collaborative innovation, enabling organizations to leverage diverse datasets for deeper insights. In real-world applications like FinTech and Smart Manufacturing, transactional data, often in tabular form, are generated and analyzed for insight generation. However, such datasets typically contain sensitive personal/business information, raising privacy concerns and regulatory risks. Data synthesis tackles this by generating artificial datasets that preserve the statistical characteristics of real data, removing direct links to individuals. However, attackers can still infer sensitive information using background knowledge. Differential privacy offers a solution by providing provable and quantifiable privacy protection. Consequently, differentially private data synthesis has emerged as a promising approach to privacy-aware data sharing. This paper provides a comprehensive overview of existing differentially private tabular data synthesis methods, highlighting the unique challenges of each generation model for generating tabular data under differential privacy constraints. We classify the methods into statistical and deep learning-based approaches based on their generation models, discussing them in both centralized and distributed environments. We evaluate and compare those methods within each category, highlighting their strengths and weaknesses in terms of utility, privacy, and computational complexity. Additionally, we present and discuss various evaluation methods for assessing the quality of the synthesized data, identify research gaps in the field and directions for future research.
Generative Unfolding with Distribution Mapping
Butter, Anja, Diefenbacher, Sascha, Huetsch, Nathan, Mikuni, Vinicius, Nachman, Benjamin, Schweitzer, Sofia Palacios, Plehn, Tilman
Machine learning enables unbinned, highly-differential cross section measurements. A recent idea uses generative models to morph a starting simulation into the unfolded data. We show how to extend two morphing techniques, Schr\"odinger Bridges and Direct Diffusion, in order to ensure that the models learn the correct conditional probabilities. This brings distribution mapping to a similar level of accuracy as the state-of-the-art conditional generative unfolding methods. Numerical results are presented with a standard benchmark dataset of single jet substructure as well as for a new dataset describing a 22-dimensional phase space of Z + 2-jets.
Compositional simulation-based inference for time series
Gloeckler, Manuel, Toyota, Shoji, Fukumizu, Kenji, Macke, Jakob H.
Amortized simulation-based inference (SBI) methods train neural networks on simulated data to perform Bayesian inference. While this approach avoids the need for tractable likelihoods, it often requires a large number of simulations and has been challenging to scale to time-series data. Scientific simulators frequently emulate real-world dynamics through thousands of single-state transitions over time. We propose an SBI framework that can exploit such Markovian simulators by locally identifying parameters consistent with individual state transitions. We then compose these local results to obtain a posterior over parameters that align with the entire time series observation. We focus on applying this approach to neural posterior score estimation but also show how it can be applied, e.g., to neural likelihood (ratio) estimation. We demonstrate that our approach is more simulation-efficient than directly estimating the global posterior on several synthetic benchmark tasks and simulators used in ecology and epidemiology. Numerical simulations are a central approach for tackling problems in a wide range of scientific and engineering disciplines, including physics (Brehmer & Cranmer, 2022; Dax et al., 2021), molecular dynamics (Hollingsworth & Dror, 2018), neuroscience (Gonçalves et al., 2020) and climate science (Watson-Parris et al., 2021). Simulators often include at least some parameters that cannot be measured experimentally. Inferring such parameters from observed data is a fundamental challenge. Bayesian inference provides a principled approach to identifying parameters that align with empirical observations. Standard algorithms for Bayesian inference, such as Markov Chain Monte Carlo (MCMC) (Gilks et al., 1995) and variational inference (Beal, 2003), generally require access to the likelihoods p(x|θ). However, for many simulators, directly evaluating the likelihood remains intractable, rendering conventional Bayesian approaches inapplicable.
Stein Variational Newton Neural Network Ensembles
Flöge, Klemens, Moeed, Mohammed Abdul, Fortuin, Vincent
Deep neural network ensembles are powerful tools for uncertainty quantification, which have recently been re-interpreted from a Bayesian perspective. However, current methods inadequately leverage second-order information of the loss landscape, despite the recent availability of efficient Hessian approximations. We propose a novel approximate Bayesian inference method that modifies deep ensembles to incorporate Stein Variational Newton updates. Our approach uniquely integrates scalable modern Hessian approximations, achieving faster convergence and more accurate posterior distribution approximations. We validate the effectiveness of our method on diverse regression and classification tasks, demonstrating superior performance with a significantly reduced number of training epochs compared to existing ensemble-based methods, while enhancing uncertainty quantification and robustness against overfitting.
A Bayesian explanation of machine learning models based on modes and functional ANOVA
Most methods in explainable AI (XAI) focus on providing reasons for the prediction of a given set of features. However, we solve an inverse explanation problem, i.e., given the deviation of a label, find the reasons of this deviation. We use a Bayesian framework to recover the ``true'' features, conditioned on the observed label value. We efficiently explain the deviation of a label value from the mode, by identifying and ranking the influential features using the ``distances'' in the ANOVA functional decomposition. We show that the new method is more human-intuitive and robust than methods based on mean values, e.g., SHapley Additive exPlanations (SHAP values). The extra costs of solving a Bayesian inverse problem are dimension-independent.