Bayesian Inference
Activity Detection for Massive Connectivity in Cell-free Networks with Unknown Large-scale Fading, Channel Statistics, Noise Variance, and Activity Probability: A Bayesian Approach
Zhang, Hao, Lin, Qingfeng, Li, Yang, Cheng, Lei, Wu, Yik-Chung
Activity detection is an important task in the next generation grant-free multiple access. While there are a number of existing algorithms designed for this purpose, they mostly require precise information about the network, such as large-scale fading coefficients, small-scale fading channel statistics, noise variance at the access points, and user activity probability. Acquiring these information would take a significant overhead and their estimated values might not be accurate. This problem is even more severe in cell-free networks as there are many of these parameters to be acquired. Therefore, this paper sets out to investigate the activity detection problem without the above-mentioned information. In order to handle so many unknown parameters, this paper employs the Bayesian approach, where the unknown variables are endowed with prior distributions which effectively act as regularizations. Together with the likelihood function, a maximum a posteriori (MAP) estimator and a variational inference algorithm are derived. Extensive simulations demonstrate that the proposed methods, even without the knowledge of these system parameters, perform better than existing state-of-the-art methods, such as covariance-based and approximate message passing methods.
CreINNs: Credal-Set Interval Neural Networks for Uncertainty Estimation in Classification Tasks
Wang, Kaizheng, Shariatmadar, Keivan, Manchingal, Shireen Kudukkil, Cuzzolin, Fabio, Moens, David, Hallez, Hans
Uncertainty estimation is increasingly attractive for improving the reliability of neural networks. In this work, we present novel credal-set interval neural networks (CreINNs) designed for classification tasks. CreINNs preserve the traditional interval neural network structure, capturing weight uncertainty through deterministic intervals, while forecasting credal sets using the mathematical framework of probability intervals. Experimental validations on an out-of-distribution detection benchmark (CIFAR10 vs SVHN) showcase that CreINNs outperform epistemic uncertainty estimation when compared to variational Bayesian neural networks (BNNs) and deep ensembles (DEs). Furthermore, CreINNs exhibit a notable reduction in computational complexity compared to variational BNNs and demonstrate smaller model sizes than DEs.
Graph Neural Networks with a Distribution of Parametrized Graphs
Lee, See Hian, Ji, Feng, Xia, Kelin, Tay, Wee Peng
Traditionally, graph neural networks have been trained using a single observed graph. However, the observed graph represents only one possible realization. In many applications, the graph may encounter uncertainties, such as having erroneous or missing edges, as well as edge weights that provide little informative value. To address these challenges and capture additional information previously absent in the observed graph, we introduce latent variables to parameterize and generate multiple graphs. We obtain the maximum likelihood estimate of the network parameters in an Expectation-Maximization (EM) framework based on the multiple graphs. Specifically, we iteratively determine the distribution of the graphs using a Markov Chain Monte Carlo (MCMC) method, incorporating the principles of PAC-Bayesian theory. Numerical experiments demonstrate improvements in performance against baseline models on node classification for heterogeneous graphs and graph regression on chemistry datasets.
Local and Global Trend Bayesian Exponential Smoothing Models
Smyl, Slawek, Bergmeir, Christoph, Dokumentov, Alexander, Long, Xueying, Wibowo, Erwin, Schmidt, Daniel
This paper describes a family of seasonal and non-seasonal time series models that can be viewed as generalisations of additive and multiplicative exponential smoothing models, to model series that grow faster than linear but slower than exponential. Their development is motivated by fast-growing, volatile time series. In particular, our models have a global trend that can smoothly change from additive to multiplicative, and is combined with a linear local trend. Seasonality when used is multiplicative in our models, and the error is always additive but is heteroscedastic and can grow through a parameter sigma. We leverage state-of-the-art Bayesian fitting techniques to accurately fit these models that are more complex and flexible than standard exponential smoothing models. When applied to the M3 competition data set, our models outperform the best algorithms in the competition as well as other benchmarks, thus achieving to the best of our knowledge the best results of per-series univariate methods on this dataset in the literature. An open-source software package of our method is available.
Learning Directed Graphical Models with Optimal Transport
Vo, Vy, Le, Trung, Vuong, Long-Tung, Zhao, He, Bonilla, Edwin, Phung, Dinh
Estimating the parameters of a probabilistic directed graphical model from incomplete data remains a long-standing challenge. This is because, in the presence of latent variables, both the likelihood function and posterior distribution are intractable without further assumptions about structural dependencies or model classes. While existing learning methods are fundamentally based on likelihood maximization, here we offer a new view of the parameter learning problem through the lens of optimal transport. This perspective licenses a general framework that operates on any directed graphs without making unrealistic assumptions on the posterior over the latent variables or resorting to black-box variational approximations. We develop a theoretical framework and support it with extensive empirical evidence demonstrating the flexibility and versatility of our approach. Across experiments, we show that not only can our method recover the ground-truth parameters but it also performs comparably or better on downstream applications, notably the non-trivial task of discrete representation learning.
Compositional Generative Modeling: A Single Model is Not All You Need
Large monolithic generative models trained on massive amounts of data have become an increasingly dominant approach in AI research. In this paper, we argue that we should instead construct large generative systems by composing smaller generative models together. We show how such a compositional generative approach enables us to learn distributions in a more data-efficient manner, enabling generalization to parts of the data distribution unseen at training time. We further show how this enables us to program and construct new generative models for tasks completely unseen at training. Finally, we show that in many cases, we can discover separate compositional components from data.
Bayesian Causal Inference with Gaussian Process Networks
Giudice, Enrico, Kuipers, Jack, Moffa, Giusi
Quantifying the causal relationships from purely observational data between variables in a system is a problem that has attracted great attention in the fields of statistics and machine learning. Full knowledge of the causal relations allows predicting the outcome of direct manipulations on the system, which can generally only be known from interventional data obtained by performing experiments such as randomized controlled trials (Eberhardt and Scheines, 2007). Predicting the effect of such manipulations without the need of costly or infeasible experiments is of great practical relevance, specifically in the fields of computational biology (Sachs et al., 2005), medicine (Richens et al., 2020) or AI (Schölkopf, 2022), since a central question concerns how a complex system will react to some treatment or outside influence of the user. Pearl's rules of do-calculus (Pearl, 2000) allow computing the intervention distributions resulting from these external manipulations from the joint distribution of the set of random variables together with a Directed Acyclic Graph (DAG). The DAG represents the qualitative causal relationships among the variables; each node in the graph represents a variable and a directed edge indicates a direct causal effect. Probabilistic models that are based on such DAGs, commonly called causal Bayesian Networks (BNs), provide conventional grounds for probabilistic causal inference, due to their compact representation of the joint distribution and their intuitive graphical description of the causal structure.
Adaptive Crowdsourcing Via Self-Supervised Learning
Kagrecha, Anmol, Marklund, Henrik, Van Roy, Benjamin, Jeon, Hong Jun, Zeckhauser, Richard
Common crowdsourcing systems average estimates of a latent quantity of interest provided by many crowdworkers to produce a group estimate. We develop a new approach -- predict-each-worker -- that leverages self-supervised learning and a novel aggregation scheme. This approach adapts weights assigned to crowdworkers based on estimates they provided for previous quantities. When skills vary across crowdworkers or their estimates correlate, the weighted sum offers a more accurate group estimate than the average. Existing algorithms such as expectation maximization can, at least in principle, produce similarly accurate group estimates. However, their computational requirements become onerous when complex models, such as neural networks, are required to express relationships among crowdworkers. Predict-each-worker accommodates such complexity as well as many other practical challenges. We analyze the efficacy of predict-each-worker through theoretical and computational studies. Among other things, we establish asymptotic optimality as the number of engagements per crowdworker grows.
Comprehensive Exploration of Synthetic Data Generation: A Survey
Bauer, André, Trapp, Simon, Stenger, Michael, Leppich, Robert, Kounev, Samuel, Leznik, Mark, Chard, Kyle, Foster, Ian
Recent years have witnessed a surge in the popularity of Machine Learning (ML), applied across diverse domains. However, progress is impeded by the scarcity of training data due to expensive acquisition and privacy legislation. Synthetic data emerges as a solution, but the abundance of released models and limited overview literature pose challenges for decision-making. This work surveys 417 Synthetic Data Generation (SDG) models over the last decade, providing a comprehensive overview of model types, functionality, and improvements. Common attributes are identified, leading to a classification and trend analysis. The findings reveal increased model performance and complexity, with neural network-based approaches prevailing, except for privacy-preserving data generation. Computer vision dominates, with GANs as primary generative models, while diffusion models, transformers, and RNNs compete. Implications from our performance evaluation highlight the scarcity of common metrics and datasets, making comparisons challenging. Additionally, the neglect of training and computational costs in literature necessitates attention in future research. This work serves as a guide for SDG model selection and identifies crucial areas for future exploration.
EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling
Ren, Siyu, Wu, Zhiyong, Zhu, Kenny Q.
Neural language models are probabilistic models of human text. They are predominantly trained using maximum likelihood estimation (MLE), which is equivalent to minimizing the forward cross-entropy between the empirical data distribution and the model distribution. However, various degeneration phenomena are still widely observed when decoding from the distributions learned by such models. We establish that the forward cross-entropy is suboptimal as a distance metric for aligning human and model distribution due to its (1) recall-prioritization (2) negative diversity ignorance and (3) train-test mismatch. In this paper, we propose Earth Mover Distance Optimization (EMO) for auto-regressive language modeling. EMO capitalizes on the inherent properties of earth mover distance to address the aforementioned challenges. Due to the high complexity of direct computation, we further introduce a feasible upper bound for EMO to ease end-to-end training. Upon extensive evaluation of language models trained using EMO and MLE. We find that EMO demonstrates a consistently better language modeling performance than MLE across domains. Moreover, EMO demonstrates noteworthy enhancements in downstream performance with minimal fine-tuning on merely 25,000 sentences. This highlights the tremendous potential of EMO as a lightweight calibration method for enhancing large-scale pre-trained language models.