Bayesian Learning
Review for NeurIPS paper: Bayesian Causal Structural Learning with Zero-Inflated Poisson Bayesian Networks
Weaknesses: The paper emphasizes its focus on causal structure learning. In doing so it assumes "causal sufficiency", that is, it assumes that there are no latent confounders of the measured variables. Generally, there are many latent confounders of the measured variables in most domains. In the past 20 years, there has been substantial progress in developing graphical representations and algorithms for learning equivalence classes of causal networks from observational data. When causal sufficiency is assumed, the learning of DAG structure is generally called Bayesian network structure learning, not causal structural learning, as in the title of the paper. It would be helpful for the paper to more prominently highlight this assumption.
Review for NeurIPS paper: Bayesian Causal Structural Learning with Zero-Inflated Poisson Bayesian Networks
All of the reviewers agree that this paper is both theoretically and modeling-wise a solid contribution to NeurIPS. My only concerns are that some of the author rebuttal points have not made it into the paper -- all of them should be added I think, in particular the related work (extended), the causal sufficiency clarification, and the run times.
Reviews: Parameter elimination in particle Gibbs sampling
The marginalisation of variables within some steps of an MCMC algorithm is delicate. The main proposal here appears well justified, but it would have been nice to see the argument made a little more explicitly. The type of marginalisation described here seems to be more or less what would be described as a (partially) collapsed Gibbs sampler in the sense of [David A Van Dyk and Taeyoung Park. "Partially collapsed Gibbs samplers: Theory and methods". It was less clear to me exactly how the "blocking" strategy detailed in Section 4.1 would be justified from a formal perspective, and I do think that this needs clarifying. I.e. the collection of variables to be sampled is divided into three parts -- x', x and theta and the decomposition of the kernel seems to involve sampling: x from a kernel invariant to its distribution conditional on both x' and theta (starting from the previous x) x' from a kernel invariant with respect to its distribution conditional only upon x (starting from the previous x') \theta from its full conditional distribution and it's not completely transparent how one knows that this is invariant with respect to the correct joint distribution.
Review for NeurIPS paper: Bayesian Deep Learning and a Probabilistic Perspective of Generalization
Summary and Contributions: This paper provides a mix between discussing high-level conceptual ideas and perspectives and presenting a variety of experimental results, all under the umbrella of generalization in (Bayesian) deep learning. More concretely, the central argument of the paper is that Bayesian learning should be primarily viewed as aiming to marginalize over different plausible hypotheses of the data, intead of relying on a single hypothesis (which is what ordinary deep learning is doing). The ultimate goal is thus to accurately estimate the posterior _predictive_ distribution (over outputs), rather than to accurately approximate the posterior distribution (over weights). They thus recommend that Bayesian methods should ideally focus their efforts on carefully representing the posterior distribution in regions that contribute most to the predictive distribution. In this line of thought, they further argue that deep ensembles, one of the state-of-the-art approaches for obtaining well-calibrated predictive distributions, do effectively approximate the Bayesian model average (even if the individual ensemble members are not actually samples from the posterior), and thus should not be considered in competition to Bayesian methods.
Learning to Help in Multi-Class Settings
Wu, Yu, Li, Yansong, Dong, Zeyu, Sathyavageeswaran, Nitya, Sarwate, Anand D.
Deploying complex machine learning models on resource-constrained devices is challenging due to limited computational power, memory, and model retrainability. To address these limitations, a hybrid system can be established by augmenting the local model with a server-side model, where samples are selectively deferred by a rejector and then sent to the server for processing. The hybrid system enables efficient use of computational resources while minimizing the overhead associated with server usage. The recently proposed Learning to Help (L2H) model trains a server model given a fixed local (client) model, differing from the Learning to Defer (L2D) framework, which trains the client for a fixed (expert) server. In both L2D and L2H, the training includes learning a rejector at the client to determine when to query the server. In this work, we extend the L2H model from binary to multi-class classification problems and demonstrate its applicability in a number of different scenarios of practical interest in which access to the server may be limited by cost, availability, or policy. We derive a stage-switching surrogate loss function that is differentiable, convex, and consistent with the Bayes rule corresponding to the 0-1 loss for the L2H model. Experiments show that our proposed methods offer an efficient and practical solution for multi-class classification in resource-constrained environments.
EFiGP: Eigen-Fourier Physics-Informed Gaussian Process for Inference of Dynamic Systems
Parameter estimation and trajectory reconstruction for data-driven dynamical systems governed by ordinary differential equations (ODEs) are essential tasks in fields such as biology, engineering, and physics. These inverse problems -- estimating ODE parameters from observational data -- are particularly challenging when the data are noisy, sparse, and the dynamics are nonlinear. We propose the Eigen-Fourier Physics-Informed Gaussian Process (EFiGP), an algorithm that integrates Fourier transformation and eigen-decomposition into a physics-informed Gaussian Process framework. This approach eliminates the need for numerical integration, significantly enhancing computational efficiency and accuracy. Built on a principled Bayesian framework, EFiGP incorporates the ODE system through probabilistic conditioning, enforcing governing equations in the Fourier domain while truncating high-frequency terms to achieve denoising and computational savings. The use of eigen-decomposition further simplifies Gaussian Process covariance operations, enabling efficient recovery of trajectories and parameters even in dense-grid settings. We validate the practical effectiveness of EFiGP on three benchmark examples, demonstrating its potential for reliable and interpretable modeling of complex dynamical systems while addressing key challenges in trajectory recovery and computational cost.
Making Reliable and Flexible Decisions in Long-tailed Classification
Long-tailed classification is challenging due to its heavy imbalance in class probabilities. While existing methods often focus on overall accuracy or accuracy for tail classes, they overlook a critical aspect: certain types of errors can carry greater risks than others in real-world long-tailed problems. For example, misclassifying patients (a tail class) as healthy individuals (a head class) entails far more serious consequences than the reverse scenario. To address this critical issue, we introduce Making Reliable and Flexible Decisions in Long-tailed Classification (RF-DLC), a novel framework aimed at reliable predictions in long-tailed problems. Leveraging Bayesian Decision Theory, we introduce an integrated gain to seamlessly combine long-tailed data distributions and the decision-making procedure. We further propose an efficient variational optimization strategy for the decision risk objective. Our method adapts readily to diverse utility matrices, which can be designed for specific tasks, ensuring its flexibility for different problem settings. In empirical evaluation, we design a new metric, False Head Rate, to quantify tail-sensitivity risk, along with comprehensive experiments on multiple real-world tasks, including large-scale image classification and uncertainty quantification, to demonstrate the reliability and flexibility of our method.
A Data-driven Dynamic Temporal Correlation Modeling Framework for Renewable Energy Scenario Generation
Dong, Xiaochong, Liu, Yilin, Zhang, Xuemin, Mei, Shengwei
Renewable energy power is influenced by the atmospheric system, which exhibits nonlinear and time-varying features. To address this, a dynamic temporal correlation modeling framework is proposed for renewable energy scenario generation. A novel decoupled mapping path is employed for joint probability distribution modeling, formulating regression tasks for both marginal distributions and the correlation structure using proper scoring rules to ensure the rationality of the modeling process. The scenario generation process is divided into two stages. Firstly, the dynamic correlation network models temporal correlations based on a dynamic covariance matrix, capturing the time-varying features of renewable energy while enhancing the interpretability of the black-box model. Secondly, the implicit quantile network models the marginal quantile function in a nonparametric, continuous manner, enabling scenario generation through marginal inverse sampling. Experimental results demonstrate that the proposed dynamic correlation quantile network outperforms state-of-the-art methods in quantifying uncertainty and capturing dynamic correlation for short-term renewable energy scenario generation.
A Semiparametric Bayesian Method for Instrumental Variable Analysis with Partly Interval-Censored Time-to-Event Outcome
Cui, Elvis Han, Lu, Xuyang, Zhou, Jin, Zhou, Hua, Li, Gang
This paper develops a semiparametric Bayesian instrumental variable analysis method for estimating the causal effect of an endogenous variable when dealing with unobserved confounders and measurement errors with partly interval-censored time-to-event data, where event times are observed exactly for some subjects but left-censored, right-censored, or interval-censored for others. Our method is based on a two-stage Dirichlet process mixture instrumental variable (DPMIV) model which simultaneously models the first-stage random error term for the exposure variable and the second-stage random error term for the time-to-event outcome using a bivariate Gaussian mixture of the Dirichlet process (DPM) model. The DPM model can be broadly understood as a mixture model with an unspecified number of Gaussian components, which relaxes the normal error assumptions and allows the number of mixture components to be determined by the data. We develop an MCMC algorithm for the DPMIV model tailored for partly interval-censored data and conduct extensive simulations to assess the performance of our DPMIV method in comparison with some competing methods. Our simulations revealed that our proposed method is robust under different error distributions and can have superior performance over its parametric counterpart under various scenarios. We further demonstrate the effectiveness of our approach on an UK Biobank data to investigate the causal effect of systolic blood pressure on time-to-development of cardiovascular disease from the onset of diabetes mellitus.