Bayesian Learning
Review for NeurIPS paper: Distributionally Robust Parametric Maximum Likelihood Estimation
Since everything is parametric, I'd expect explicit rates of convergence involvind all probalem complexity parameters (n, m, p, etc.) To make the rest of my points clear, let me recall the following notations are used in the paper: - n: the dimensionality of the covariate (i.e feature vector) X. Thus X is random vector in R n. BTW, in the context of ML or stats, I'd use another notation here, as n conventionally stands for "sample size".
Review for NeurIPS paper: Distributionally Robust Parametric Maximum Likelihood Estimation
This paper proposes a method for distributionally robust optimization under KL ambiguity sets for exponential families. Although KL ambiguity sets have their drawbacks, in particular not covering any changes in the inputs x, the present work produces a standard conic problem for a wide problem class via a novel analysis, provides good theoretical analysis, and yields good numerical results for a variety of small-scale classification problems. With the various clarifications that came up in the reviews, this paper makes a solid contribution to the DRO literature and will be quite welcome to the NeurIPS audience.
Reviews: A state-space model for inferring effective connectivity of latent neural dynamics from simultaneous EEG/fMRI
This paper develops a novel method to infer directional relationships between cortical areas of the brain based on simultaneously acquired EEG and fMRI data. Specifically, the fMRI activations are used to select ROIs related to the paradigm of interest. This information is used in a coupled state-space and forward propagation model to identify robust spatial sources and directional connectivity. The authors use a variational Bayesian framework to infer the latent posteriors and noise covariances. They demonstrate the power of joint EEG/fMRI analysis using two simulated experiments and a real-world dataset.
Review for NeurIPS paper: Towards Scalable Bayesian Learning of Causal DAGs
Weaknesses: The novelty of the paper is very limited. The ais authors concentrate on computational tricks, tries to improve the scalability of the algorithm. And they achieve some success. However, for NIPS paper I would expect not only to improve implementation of the algorithm but also some new concepts. I do not found any new ideas in that sense.
Reduced-order modeling and classification of hydrodynamic pattern formation in gravure printing
Rothmann-Brumm, Pauline, Brunton, Steven L., Scherl, Isabel
Hydrodynamic pattern formation phenomena in printing and coating processes are still not fully understood. However, fundamental understanding is essential to achieve high-quality printed products and to tune printed patterns according to the needs of a specific application like printed electronics, graphical printing, or biomedical printing. The aim of the paper is to develop an automated pattern classification algorithm based on methods from supervised machine learning and reduced-order modeling. We use the HYPA-p dataset, a large image dataset of gravure-printed images, which shows various types of hydrodynamic pattern formation phenomena. It enables the correlation of printing process parameters and resulting printed patterns for the first time. 26880 images of the HYPA-p dataset have been labeled by a human observer as dot patterns, mixed patterns, or finger patterns; 864000 images (97%) are unlabeled. A singular value decomposition (SVD) is used to find the modes of the labeled images and to reduce the dimensionality of the full dataset by truncation and projection. Selected machine learning classification techniques are trained on the reduced-order data. We investigate the effect of several factors, including classifier choice, whether or not fast Fourier transform (FFT) is used to preprocess the labeled images, data balancing, and data normalization. The best performing model is a k-nearest neighbor (kNN) classifier trained on unbalanced, FFT-transformed data with a test error of 3%, which outperforms a human observer by 7%. Data balancing slightly increases the test error of the kNN-model to 5%, but also increases the recall of the mixed class from 90% to 94%. Finally, we demonstrate how the trained models can be used to predict the pattern class of unlabeled images and how the predictions can be correlated to the printing process parameters, in the form of regime maps.
Hierarchical Count Echo State Network Models with Application to Graduate Student Enrollments
Wang, Qi, Parker, Paul A., Lund, Robert B.
Poisson autoregressive count models have evolved into a time series staple for correlated count data. This paper proposes an alternative to Poisson autoregressions: count echo state networks. Echo state networks can be statistically analyzed in frequentist manners via optimizing penalized likelihoods, or in Bayesian manners via MCMC sampling. This paper develops Poisson echo state techniques for count data and applies them to a massive count data set containing the number of graduate students from 1,758 United States universities during the years 1972-2021 inclusive. Negative binomial models are also implemented to better handle overdispersion in the counts. Performance of the proposed models are compared via their forecasting performance as judged by several methods. In the end, a hierarchical negative binomial based echo state network is judged as the superior model.
Causal Discovery via Bayesian Optimization
Duong, Bao, Gupta, Sunil, Nguyen, Thin
Existing score-based methods for directed acyclic graph (DAG) learning from observational data struggle to recover the causal graph accurately and sample-efficiently. To overcome this, in this study, we propose DrBO (DAG recovery via Bayesian Optimization)-a novel DAG learning framework leveraging Bayesian optimization (BO) to find high-scoring DAGs. We show that, by sophisticatedly choosing the promising DAGs to explore, we can find higher-scoring ones much more efficiently. To address the scalability issues of conventional BO in DAG learning, we replace Gaussian Processes commonly employed in BO with dropout neural networks, trained in a continual manner, which allows for (i) flexibly modeling the DAG scores without overfitting, (ii) incorporation of uncertainty into the estimated scores, and (iii) scaling with the number of evaluations. As a result, DrBO is computationally efficient and can find the accurate DAG in fewer trials and less time than existing state-of-the-art methods. This is demonstrated through an extensive set of empirical evaluations on many challenging settings with both synthetic and real data. Our implementation is available at https://github.com/baosws/DrBO.
Reviews: Bayesian Learning of Sum-Product Networks
Given the space constraint of the rebuttal, I will trust the authors to indeed incorporate the changes as promised, and given this I increased my score. However, at several places in this paper, it is too dense to follow. More detailed comments are as follows. First, this paper lacks a dedicated related work section. There is some brief discussion about how this work differs from existing literature, in the introduction, yet it is not enough.