Bayesian Inference
Variational Inference with Mixtures of Isotropic Gaussians
Petit-Talamon, Marguerite, Lambert, Marc, Korba, Anna
Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. In this paper, we focus on the following parametric family: mixtures of isotropic Gaussians (i.e., with diagonal covariance matrices proportional to the identity) and uniform weights. We develop a variational framework and provide efficient algorithms suited for this family. In contrast with mixtures of Gaussian with generic covariance matrices, this choice presents a balance between accurate approximations of multimodal Bayesian posteriors, while being memory and computationally efficient. Our algorithms implement gradient descent on the location of the mixture components (the modes of the Gaussians), and either (an entropic) Mirror or Bures descent on their variance parameters. We illustrate the performance of our algorithms on numerical experiments.
Uncovering Social Network Activity Using Joint User and Topic Interaction
Abel, Gaspard, Kalogeratos, Argyris, Nadal, Jean-Pierre, Randon-Furling, Julien
The emergence of online social platforms, such as social networks and social media, has drastically affected the way people apprehend the information flows to which they are exposed. In such platforms, various information cascades spreading among users is the main force creating complex dynamics of opinion formation, each user being characterized by their own behavior adoption mechanism. Moreover, the spread of multiple pieces of information or beliefs in a networked population is rarely uncorrelated. In this paper, we introduce the Mixture of Interacting Cascades (MIC), a model of marked multidimensional Hawkes processes with the capacity to model jointly non-trivial interaction between cascades and users. We emphasize on the interplay between information cascades and user activity, and use a mixture of temporal point processes to build a coupled user/cascade point process model. Experiments on synthetic and real data highlight the benefits of this approach and demonstrate that MIC achieves superior performance to existing methods in modeling the spread of information cascades. Finally, we demonstrate how MIC can provide, through its learned parameters, insightful bi-layered visualizations of real social network activity data.
Efficient Network Automatic Relevance Determination
Zhang, Hongwei, Ye, Ziqi, Wang, Xinyuan, Guo, Xin, Xu, Zenglin, Cheng, Yuan, Hu, Zixin, Qi, Yuan
We propose Network Automatic Relevance Determination (NARD), an extension of ARD for linearly probabilistic models, to simultaneously model sparse relationships between inputs $X \in \mathbb R^{d \times N}$ and outputs $Y \in \mathbb R^{m \times N}$, while capturing the correlation structure among the $Y$. NARD employs a matrix normal prior which contains a sparsity-inducing parameter to identify and discard irrelevant features, thereby promoting sparsity in the model. Algorithmically, it iteratively updates both the precision matrix and the relationship between $Y$ and the refined inputs. To mitigate the computational inefficiencies of the $\mathcal O(m^3 + d^3)$ cost per iteration, we introduce Sequential NARD, which evaluates features sequentially, and a Surrogate Function Method, leveraging an efficient approximation of the marginal likelihood and simplifying the calculation of determinant and inverse of an intermediate matrix. Combining the Sequential update with the Surrogate Function method further reduces computational costs. The computational complexity per iteration for these three methods is reduced to $\mathcal O(m^3+p^3)$, $\mathcal O(m^3 + d^2)$, $\mathcal O(m^3+p^2)$, respectively, where $p \ll d$ is the final number of features in the model. Our methods demonstrate significant improvements in computational efficiency with comparable performance on both synthetic and real-world datasets.
Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks
Lee, Dongwoo, Lee, Dong Bok, Adriaensen, Steven, Lee, Juho, Hwang, Sung Ju, Hutter, Frank, Kim, Seon Joo, Lee, Hae Beom
Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications involving decision-making problems such as determining the expected performance improvements achievable by investing additional computational resources. In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation. Specifically, we design a prior distribution that enables the sampling of infinitely many synthetic functions resembling real-world neural scaling laws, allowing our PFN to meta-learn the extrapolation. We validate the effectiveness of our approach on real-world neural scaling laws, comparing it against both the existing point estimation methods and Bayesian approaches. Our method demonstrates superior performance, particularly in data-limited scenarios such as Bayesian active learning, underscoring its potential for reliable, uncertainty-aware extrapolation in practical applications.
Wireless Channel Identification via Conditional Diffusion Model
Li, Yuan, Zheng, Zhong, Liu, Chang, Fei, Zesong
The identification of channel scenarios in wireless systems plays a crucial role in channel modeling, radio fingerprint positioning, and transceiver design. Traditional methods to classify channel scenarios are based on typical statistical characteristics of channels, such as K-factor, path loss, delay spread, etc. However, statistic-based channel identification methods cannot accurately differentiate implicit features induced by dynamic scatterers, thus performing very poorly in identifying similar channel scenarios. In this paper, we propose a novel channel scenario identification method, formulating the identification task as a maximum a posteriori (MAP) estimation. Furthermore, the MAP estimation is reformulated by a maximum likelihood estimation (MLE), which is then approximated and solved by the conditional generative diffusion model. Specifically, we leverage a transformer network to capture hidden channel features in multiple latent noise spaces within the reverse process of the conditional generative diffusion model. These detailed features, which directly affect likelihood functions in MLE, enable highly accurate scenario identification. Experimental results show that the proposed method outperforms traditional methods, including convolutional neural networks (CNNs), back-propagation neural networks (BPNNs), and random forest-based classifiers, improving the identification accuracy by more than 10%.
The Problem With Early Cancer Detection
The discovery began, as many breakthroughs do, with an observation that didn't quite make sense. In 1948, two French researchers, Paul Mandel and Pierre Mรฉtais, published a little-noticed paper in a scientific journal. Working in a laboratory in Strasbourg, they had been cataloguing the chemical contents of blood plasma--that river of life teeming with proteins, sugars, waste, nutrients, and cellular debris. Amid this familiar inventory, they'd spotted an unexpected presence: fragments of DNA drifting freely. The finding defied biological orthodoxy. DNA was thought to remain locked inside the nuclei of cells, and not float around on its own.
Bias and Identifiability in the Bounded Confidence Model
Borile, Claudio, Lenti, Jacopo, Ghidini, Valentina, Monti, Corrado, Morales, Gianmarco De Francisci
Opinion dynamics models such as the bounded confidence models (BCMs) describe how a population can reach consensus, fragmentation, or polarization, depending on a few parameters. Connecting such models to real-world data could help understanding such phenomena, testing model assumptions. To this end, estimation of model parameters is a key aspect, and maximum likelihood estimation provides a principled way to tackle it. Here, our goal is to outline the properties of statistical estimators of the two key BCM parameters: the confidence bound and the convergence rate. We find that their maximum likelihood estimators present different characteristics: the one for the confidence bound presents a small-sample bias but is consistent, while the estimator of the convergence rate shows a persistent bias. Moreover, the joint parameter estimation is affected by identifiability issues for specific regions of the parameter space, as several local maxima are present in the likelihood function. Our results show how the analysis of the likelihood function is a fruitful approach for better understanding the pitfalls and possibilities of estimating the parameters of opinion dynamics models, and more in general, agent-based models, and for offering formal guarantees for their calibration.
Differential Privacy in Machine Learning: From Symbolic AI to LLMs
Aguilera-Martรญnez, Francisco, Berzal, Fernando
Machine learning models should not reveal particular information that is not otherwise accessible. Differential privacy provides a formal framework to mitigate privacy risks by ensuring that the inclusion or exclusion of any single data point does not significantly alter the output of an algorithm, thus limiting the exposure of private information. This survey paper explores the foundational definitions of differential privacy, reviews its original formulations and tracing its evolution through key research contributions. It then provides an in-depth examination of how DP has been integrated into machine learning models, analyzing existing proposals and methods to preserve privacy when training ML models. Finally, it describes how DP-based ML techniques can be evaluated in practice. %Finally, it discusses the broader implications of DP, highlighting its potential for public benefit, its real-world applications, and the challenges it faces, including vulnerabilities to adversarial attacks. By offering a comprehensive overview of differential privacy in machine learning, this work aims to contribute to the ongoing development of secure and responsible AI systems.
Real-World Deployment of a Lane Change Prediction Architecture Based on Knowledge Graph Embeddings and Bayesian Inference
Manzour, M., Elias, Catherine M., Shehata, Omar M., Izquierdo, R., Sotelo, M. A.
--Research on lane change prediction has gained a lot of momentum in the last couple of years. However, most research is confined to simulation or results obtained from datasets, leaving a gap between algorithmic advances and on-road deployment. This work closes that gap by demonstrating, on real hardware, a lane-change prediction system based on Knowledge Graph Embeddings (KGEs) and Bayesian inference. Moreover, the ego-vehicle employs a longitudinal braking action to ensure the safety of both itself and the surrounding vehicles. Our architecture consists of two modules: (i) a perception module that senses the environment, derives input numerical features, and converts them into linguistic categories; and communicates them to the prediction module; (ii) a pretrained prediction module that executes a KGE and Bayesian inference model to anticipate the target vehicle's maneuver and transforms the prediction into longitudinal braking action. Real-world hardware experimental validation demonstrates that our prediction system anticipates the target vehicle's lane change three to four seconds in advance, providing the ego vehicle sufficient time to react and allowing the target vehicle to make the lane change safely. Traffic accidents are one of the leading causes of death worldwide. As road networks become more complex and the number of vehicles increases, the ability to anticipate the lane change maneuvers of surrounding vehicles becomes not only beneficial but also essential for enhancing road safety. That's why research on lane change prediction has gained significant momentum in recent years.
Course Project Report: Comparing MCMC and Variational Inference for Bayesian Probabilistic Matrix Factorization on the MovieLens Dataset
This is a course project report with complete methodology, experiments, references and mathematical derivations. Matrix factorization [1] is a widely used technique in recommendation systems. Probabilistic Matrix Factorization (PMF) [2] extends traditional matrix factorization by incorporating probability distributions over latent factors, allowing for uncertainty quantification. However, computing the posterior distribution is intractable due to the high-dimensional integral. To address this, we employ two Bayesian inference methods: Markov Chain Monte Carlo (MCMC) [3, 4] and Variational Inference (VI) [5, 6] to approximate the posterior. We evaluate their performance on MovieLens dataset [7] and compare their convergence speed, predictive accuracy, and computational efficiency. Experimental results demonstrate that VI offers faster convergence, while MCMC provides more accurate posterior estimates.