Bayesian Learning
Reweighting Improves Conditional Risk Bounds
Zhang, Yikai, Lin, Jiahe, Li, Fengpei, Zheng, Songzhu, Raj, Anant, Schneider, Anderson, Nevmyvaka, Yuriy
In this work, we study the weighted empirical risk minimization (weighted ERM) schema, in which an additional data-dependent weight function is incorporated when the empirical risk function is being minimized. We show that under a general ``balanceable" Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions over the one obtained from standard ERM, and the superiority manifests itself through a data-dependent constant term in the error bound. These sub-regions correspond to large-margin ones in classification settings and low-variance ones in heteroscedastic regression settings, respectively. Our findings are supported by evidence from synthetic data experiments.
Practical machine learning is learning on small samples
Based on limited observations, machine learning discerns a dependence which is expected to hold in the future. What makes it possible? Statistical learning theory imagines indefinitely increasing training sample to justify its approach. In reality, there is no infinite time or even infinite general population for learning. Here I argue that practical machine learning is based on an implicit assumption that underlying dependence is relatively ``smooth" : likely, there are no abrupt differences in feedback between cases with close data points. From this point of view learning shall involve selection of the hypothesis ``smoothly" approximating the training set. I formalize this as Practical learning paradigm. The paradigm includes terminology and rules for description of learners. Popular learners (local smoothing, k-NN, decision trees, Naive Bayes, SVM for classification and for regression) are shown here to be implementations of this paradigm.
Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry
Zimmermann, Yoel, Bazgir, Adib, Afzal, Zartashia, Agbere, Fariha, Ai, Qianxiang, Alampara, Nawaf, Al-Feghali, Alexander, Ansari, Mehrad, Antypov, Dmytro, Aswad, Amro, Bai, Jiaru, Baibakova, Viktoriia, Biswajeet, Devi Dutta, Bitzek, Erik, Bocarsly, Joshua D., Borisova, Anna, Bran, Andres M, Brinson, L. Catherine, Calderon, Marcel Moran, Canalicchio, Alessandro, Chen, Victor, Chiang, Yuan, Circi, Defne, Charmes, Benjamin, Chaudhary, Vikrant, Chen, Zizhang, Chiu, Min-Hsueh, Clymo, Judith, Dabhadkar, Kedar, Daelman, Nathan, Datar, Archit, de Jong, Wibe A., Evans, Matthew L., Fard, Maryam Ghazizade, Fisicaro, Giuseppe, Gangan, Abhijeet Sadashiv, George, Janine, Gonzalez, Jose D. Cojal, Gรถtte, Michael, Gupta, Ankur K., Harb, Hassan, Hong, Pengyu, Ibrahim, Abdelrahman, Ilyas, Ahmed, Imran, Alishba, Ishimwe, Kevin, Issa, Ramsey, Jablonka, Kevin Maik, Jones, Colin, Josephson, Tyler R., Juhasz, Greg, Kapoor, Sarthak, Kang, Rongda, Khalighinejad, Ghazal, Khan, Sartaaj, Klawohn, Sascha, Kuman, Suneel, Ladines, Alvin Noe, Leang, Sarom, Lederbauer, Magdalena, Sheng-Lun, null, Liao, null, Liu, Hao, Liu, Xuefeng, Lo, Stanley, Madireddy, Sandeep, Maharana, Piyush Ranjan, Maheshwari, Shagun, Mahjoubi, Soroush, Mรกrquez, Josรฉ A., Mills, Rob, Mohanty, Trupti, Mohr, Bernadette, Moosavi, Seyed Mohamad, Moรhammer, Alexander, Naghdi, Amirhossein D., Naik, Aakash, Narykov, Oleksandr, Nรคsstrรถm, Hampus, Nguyen, Xuan Vu, Ni, Xinyi, O'Connor, Dana, Olayiwola, Teslim, Ottomano, Federico, Ozhan, Aleyna Beste, Pagel, Sebastian, Parida, Chiku, Park, Jaehee, Patel, Vraj, Patyukova, Elena, Petersen, Martin Hoffmann, Pinto, Luis, Pizarro, Josรฉ M., Plessers, Dieter, Pradhan, Tapashree, Pratiush, Utkarsh, Puli, Charishma, Qin, Andrew, Rajabi, Mahyar, Ricci, Francesco, Risch, Elliot, Rรญos-Garcรญa, Martiรฑo, Roy, Aritra, Rug, Tehseen, Sayeed, Hasan M, Scheidgen, Markus, Schilling-Wilhelmi, Mara, Schloz, Marcel, Schรถppach, Fabian, Schumann, Julia, Schwaller, Philippe, Schwarting, Marcus, Sharlin, Samiha, Shen, Kevin, Shi, Jiale, Si, Pradip, D'Souza, Jennifer, Sparks, Taylor, Sudhakar, Suraj, Talirz, Leopold, Tang, Dandan, Taran, Olga, Terboven, Carla, Tropin, Mark, Tsymbal, Anastasiia, Ueltzen, Katharina, Unzueta, Pablo Andres, Vasan, Archit, Vinchurkar, Tirtha, Vo, Trung, Vogel, Gabriel, Vรถlker, Christoph, Weinreich, Jan, Yang, Faradawn, Zaki, Mohd, Zhang, Chi, Zhang, Sylvester, Zhang, Weijie, Zhu, Ruijie, Zhu, Shang, Janssen, Jan, Li, Calvin, Foster, Ian, Blaiszik, Ben
Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) molecular and material design; (3) automation and novel interfaces; (4) scientific communication and education; (5) research data management and automation; (6) hypothesis generation and evaluation; and (7) knowledge extraction and reasoning from scientific literature. Each team submission is presented in a summary table with links to the code and as brief papers in the appendix. Beyond team results, we discuss the hackathon event and its hybrid format, which included physical hubs in Toronto, Montreal, San Francisco, Berlin, Lausanne, and Tokyo, alongside a global online hub to enable local and virtual collaboration. Overall, the event highlighted significant improvements in LLM capabilities since the previous year's hackathon, suggesting continued expansion of LLMs for applications in materials science and chemistry research. These outcomes demonstrate the dual utility of LLMs as both multipurpose models for diverse machine learning tasks and platforms for rapid prototyping custom applications in scientific research.
Multi-View Majority Vote Learning Algorithms: Direct Minimization of PAC-Bayesian Bounds
Hennequin, Mehdi, Zitouni, Abdelkrim, Benabdeslem, Khalid, Elghazel, Haytham, Gaci, Yacine
The PAC-Bayesian framework has significantly advanced the understanding of statistical learning, particularly for majority voting methods. Despite its successes, its application to multi-view learning -- a setting with multiple complementary data representations -- remains underexplored. In this work, we extend PAC-Bayesian theory to multi-view learning, introducing novel generalization bounds based on R\'enyi divergence. These bounds provide an alternative to traditional Kullback-Leibler divergence-based counterparts, leveraging the flexibility of R\'enyi divergence. Furthermore, we propose first- and second-order oracle PAC-Bayesian bounds and extend the C-bound to multi-view settings. To bridge theory and practice, we design efficient self-bounding optimization algorithms that align with our theoretical results.
Summarizing Bayesian Nonparametric Mixture Posterior -- Sliced Optimal Transport Metrics for Gaussian Mixtures
Existing methods to summarize posterior inference for mixture models focus on identifying a point estimate of the implied random partition for clustering, with density estimation as a secondary goal (Wade and Ghahramani, 2018; Dahl et al., 2022). We propose a novel approach for summarizing posterior inference in nonparametric Bayesian mixture models, prioritizing density estimation of the mixing measure (or mixture) as an inference target. One of the key features is the model-agnostic nature of the approach, which remains valid under arbitrarily complex dependence structures in the underlying sampling model. Using a decision-theoretic framework, our method identifies a point estimate by minimizing posterior expected loss. A loss function is defined as a discrepancy between mixing measures. Estimating the mixing measure implies inference on the mixture density and the random partition. Exploiting the discrete nature of the mixing measure, we use a version of sliced Wasserstein distance. We introduce two specific variants for Gaussian mixtures. The first, mixed sliced Wasserstein, applies generalized geodesic projections on the product of the Euclidean space and the manifold of symmetric positive definite matrices. The second, sliced mixture Wasserstein, leverages the linearity of Gaussian mixture measures for efficient projection.
Marketing Mix Modeling in Lemonade
Marketing mix modeling (MMM) is a widely used method to assess the effectiveness of marketing campaigns and optimize marketing strategies. Bayesian MMM is an advanced approach that allows for the incorporation of prior information, uncertainty quantification, and probabilistic predictions (1). In this paper, we describe the process of building a Bayesian MMM model for the online insurance company Lemonade. We first collected data on Lemonade's marketing activities, such as online advertising, social media, and brand marketing, as well as performance data. We then used a Bayesian framework to estimate the contribution of each marketing channel on total performance, while accounting for various factors such as seasonality, market trends, and macroeconomic indicators. To validate the model, we compared its predictions with the actual performance data from A/B-testing and sliding window holdout data (2). The results showed that the predicted contribution of each marketing channel is aligned with A/B test performance and is actionable. Furthermore, we conducted several scenario analyses using convex optimization to test the sensitivity of the model to different assumptions and to evaluate the impact of changes in the marketing mix on sales. The insights gained from the model allowed Lemonade to adjust their marketing strategy and allocate their budget more effectively. Our case study demonstrates the benefits of using Bayesian MMM for marketing attribution and optimization in a data-driven company like Lemonade. The approach is flexible, interpretable, and can provide valuable insights for decision-making.
Bayesian Active Learning By Distribution Disagreement
Werner, Thorben, Schmidt-Thieme, Lars
The ever growing need for data for machine learning science and applications has fueled a long history of Active Learning (AL) research, as it is able to reduce the amount of annotations necessary to train strong models. However, most research was done for classification problems, as it is generally easier to derive uncertainty quantification (UC) from classification output without changing the model or training procedure. This feat is a lot less common for regression models, with few historic exceptions like Gaussian Processes. This leads to regression problems being under-researched in AL literature. In this paper, we are focusing specifically on the area of regression and recent models with uncertainty quantification (UC) in the architecture. Recently, two main approaches of UC for regression problems have been researched: Firstly, Gaussian neural networks (GNN) [6, 14], which use a neural network to parametrize ยต and ฯ parameters and build a Gaussian predictive distribution and secondly, Normalizing Flows [16, 4], which are parametrizing a free-form predictive distribution with invertible transformations to be able to model more complex target distributions. Their predictive distributions allow these models to not only be trained via Negative Log Likelihood (NLL), but also to draw samples from the predictive distribution as well as to compute the log likelihood of any given point y. Recent works [2, 1] have investigated the potential of uncertainty quantification with normalizing flows by experimenting on synthetic experiments with a known ground-truth uncertainty. Intuitively, a predictive distribution should inertly allow for a good uncertainty quantification (e.g.
Deep Discrete Encoders: Identifiable Deep Generative Models for Rich Data with Discrete Latent Layers
In the era of generative AI, deep generative models (DGMs) with latent representations have gained tremendous popularity. Despite their impressive empirical performance, the statistical properties of these models remain underexplored. DGMs are often overparametrized, non-identifiable, and uninterpretable black boxes, raising serious concerns when deploying them in high-stakes applications. Motivated by this, we propose an interpretable deep generative modeling framework for rich data types with discrete latent layers, called Deep Discrete Encoders (DDEs). A DDE is a directed graphical model with multiple binary latent layers. Theoretically, we propose transparent identifiability conditions for DDEs, which imply progressively smaller sizes of the latent layers as they go deeper. Identifiability ensures consistent parameter estimation and inspires an interpretable design of the deep architecture. Computationally, we propose a scalable estimation pipeline of a layerwise nonlinear spectral initialization followed by a penalized stochastic approximation EM algorithm. This procedure can efficiently estimate models with exponentially many latent components. Extensive simulation studies validate our theoretical results and demonstrate the proposed algorithms' excellent performance. We apply DDEs to three diverse real datasets for hierarchical topic modeling, image representation learning, response time modeling in educational testing, and obtain interpretable findings.
Adaptive posterior distributions for uncertainty analysis of covariance matrices in Bayesian inversion problems for multioutput signals
Curbelo, E., Martino, L., Llorente, F., Delgado-Gomez, D.
In this paper we address the problem of performing Bayesian inference for the parameters of a nonlinear multi-output model and the covariance matrix of the different output signals. We propose an adaptive importance sampling (AIS) scheme for multivariate Bayesian inversion problems, which is based in two main ideas: the variables of interest are split in two blocks and the inference takes advantage of known analytical optimization formulas. We estimate both the unknown parameters of the multivariate non-linear model and the covariance matrix of the noise. In the first part of the proposed inference scheme, a novel AIS technique called adaptive target adaptive importance sampling (ATAIS) is designed, which alternates iteratively between an IS technique over the parameters of the non-linear model and a frequentist approach for the covariance matrix of the noise. In the second part of the proposed inference scheme, a prior density over the covariance matrix is considered and the cloud of samples obtained by ATAIS are recycled and re-weighted to obtain a complete Bayesian study over the model parameters and covariance matrix. ATAIS is the main contribution of the work. Additionally, the inverted layered importance sampling (ILIS) is presented as a possible compelling algorithm (but based on a conceptually simpler idea). Different numerical examples show the benefits of the proposed approaches
Amortized Bayesian Experimental Design for Decision-Making
Huang, Daolang, Guo, Yujia, Acerbi, Luigi, Kaski, Samuel
Many critical decisions, such as personalized medical diagnoses and product pricing, are made based on insights gained from designing, observing, and analyzing a series of experiments. This highlights the crucial role of experimental design, which goes beyond merely collecting information on system parameters as in traditional Bayesian experimental design (BED), but also plays a key part in facilitating downstream decision-making. Most recent BED methods use an amortized policy network to rapidly design experiments. However, the information gathered through these methods is suboptimal for down-the-line decision-making, as the experiments are not inherently designed with downstream objectives in mind. In this paper, we present an amortized decision-aware BED framework that prioritizes maximizing downstream decision utility. We introduce a novel architecture, the Transformer Neural Decision Process (TNDP), capable of instantly proposing the next experimental design, whilst inferring the downstream decision, thus effectively amortizing both tasks within a unified workflow. We demonstrate the performance of our method across several tasks, showing that it can deliver informative designs and facilitate accurate decision-making.