Bayesian Learning
Constraint-based Causal Discovery for Non-Linear Structural Causal Models with Cycles and Latent Confounders
Forrรฉ, Patrick, Mooij, Joris M.
We address the problem of causal discovery from data, making use of the recently proposed causal modeling framework of modular structural causal models (mSCM) to handle cycles, latent confounders and non-linearities. We introduce {\sigma}-connection graphs ({\sigma}-CG), a new class of mixed graphs (containing undirected, bidirected and directed edges) with additional structure, and extend the concept of {\sigma}-separation, the appropriate generalization of the well-known notion of d-separation in this setting, to apply to {\sigma}-CGs. We prove the closedness of {\sigma}-separation under marginalisation and conditioning and exploit this to implement a test of {\sigma}-separation on a {\sigma}-CG. This then leads us to the first causal discovery algorithm that can handle non-linear functional relations, latent confounders, cyclic causal relationships, and data from different (stochastic) perfect interventions. As a proof of concept, we show on synthetic data how well the algorithm recovers features of the causal graph of modular structural causal models.
Sampling and Inference for Beta Neutral-to-the-Left Models of Sparse Networks
Bloem-Reddy, Benjamin, Foster, Adam, Mathieu, Emile, Teh, Yee Whye
Empirical evidence suggests that heavy-tailed degree distributions occurring in many real networks are well-approximated by power laws with exponents $\eta$ that may take values either less than and greater than two. Models based on various forms of exchangeability are able to capture power laws with $\eta < 2$, and admit tractable inference algorithms; we draw on previous results to show that $\eta > 2$ cannot be generated by the forms of exchangeability used in existing random graph models. Preferential attachment models generate power law exponents greater than two, but have been of limited use as statistical models due to the inherent difficulty of performing inference in non-exchangeable models. Motivated by this gap, we design and implement inference algorithms for a recently proposed class of models that generates $\eta$ of all possible values. We show that although they are not exchangeable, these models have probabilistic structure amenable to inference. Our methods make a large class of previously intractable models useful for statistical inference.
Pairwise Covariates-adjusted Block Model for Community Detection
One of the most fundamental problems in network study is community detection. The stochastic block model (SBM) is one widely used model for network data with different estimation methods developed with their community detection consistency results unveiled. However, the SBM is restricted by the strong assumption that all nodes in the same community are stochastically equivalent, which may not be suitable for practical applications. We introduce pairwise covariates-adjusted stochastic block model (PCABM), a generalization of SBM that incorporates pairwise covariate information. We study the maximum likelihood estimates of the coefficients for the covariates as well as the community assignments. It is shown that both the coefficient estimates of the covariates and the community assignments are consistent under suitable sparsity conditions. Spectral clustering with adjustment (SCWA) is introduced to efficiently solve PCABM. Under certain conditions, we derive the error bound of community estimation under SCWA and show that it is community detection consistent. PCABM compares favorably with the SBM or degree-corrected stochastic block model (DCBM) under a wide range of simulated and real networks when covariate information is accessible.
Quantifying model form uncertainty in Reynolds-averaged turbulence models with Bayesian deep neural networks
Geneva, Nicholas, Zabaras, Nicholas
Data-driven methods for improving turbulence modeling in Reynolds-Averaged Navier-Stokes (RANS) simulations have gained significant interest in the computational fluid dynamics community. Modern machine learning models have opened up a new area of black-box turbulence models allowing for the tuning of RANS simulations to increase their predictive accuracy. While several data-driven turbulence models have been reported, the quantification of the uncertainties introduced has mostly been neglected. Uncertainty quantification for such data-driven models is essential since their predictive capability rapidly declines as they are tested for flow physics that deviate from that in the training data. In this work, we propose a novel data-driven framework that not only improves RANS predictions but also provides probabilistic bounds for fluid quantities such as velocity and pressure. The uncertainties capture include both model form uncertainty as well as epistemic uncertainty induced by the limited training data. An invariant Bayesian deep neural network is used to predict the anisotropic tensor component of the Reynolds stress. This model is trained using Stein's variational gradient decent algorithm. The computed uncertainty on the Reynolds stress is propagated to the quantities of interest by vanilla Monte Carlo simulation. Results are presented for two test cases that differ geometrically from the training flows at several different Reynolds numbers. The prediction enhancement of the data-driven model is discussed as well as the associated probabilistic bounds for flow properties of interest. Ultimately this framework allows for a quantitative measurement of model confidence and uncertainty quantification for flows in which no high-fidelity observations or prior knowledge is available.
Automated labeling of bugs and tickets using attention-based mechanisms in recurrent neural networks
Lyubinets, Volodymyr, Boiko, Taras, Nicholas, Deon
We explore solutions for automated labeling of content in bug trackers and customer support systems. In order to do that, we classify content in terms of several criteria, such as priority or product area. In the first part of the paper, we provide an overview of existing methods used for text classification. These methods fall into two categories - the ones that rely on neural networks and the ones that don't. We evaluate results of several solutions of both kinds. In the second part of the paper we present our own recurrent neural network solution based on hierarchical attention paradigm. It consists of several Hierarchical Attention network blocks with varying Gated Recurrent Unit cell sizes and a complementary shallow network that goes alongside. Lastly, we evaluate above-mentioned methods when predicting fields from two datasets - Arch Linux bug tracker and Chromium bug tracker. Our contributions include a comprehensive benchmark between a variety of methods on relevant datasets; a novel solution that outperforms previous generation methods; and two new datasets that are made public for further research.
Machine Learning in High Energy Physics Community White Paper
Albertsson, Kim, Altoe, Piero, Anderson, Dustin, Andrews, Michael, Espinosa, Juan Pedro Araque, Aurisano, Adam, Basara, Laurent, Bevan, Adrian, Bhimji, Wahid, Bonacorsi, Daniele, Calafiura, Paolo, Campanelli, Mario, Capps, Louis, Carminati, Federico, Carrazza, Stefano, Childers, Taylor, Coniavitis, Elias, Cranmer, Kyle, David, Claire, Davis, Douglas, Duarte, Javier, Erdmann, Martin, Eschle, Jonas, Farbin, Amir, Feickert, Matthew, Castro, Nuno Filipe, Fitzpatrick, Conor, Floris, Michele, Forti, Alessandra, Garra-Tico, Jordi, Gemmler, Jochen, Girone, Maria, Glaysher, Paul, Gleyzer, Sergei, Gligorov, Vladimir, Golling, Tobias, Graw, Jonas, Gray, Lindsey, Greenwood, Dick, Hacker, Thomas, Harvey, John, Hegner, Benedikt, Heinrich, Lukas, Hooberman, Ben, Junggeburth, Johannes, Kagan, Michael, Kane, Meghan, Kanishchev, Konstantin, Karpiลski, Przemysลaw, Kassabov, Zahari, Kaul, Gaurav, Kcira, Dorian, Keck, Thomas, Klimentov, Alexei, Kowalkowski, Jim, Kreczko, Luke, Kurepin, Alexander, Kutschke, Rob, Kuznetsov, Valentin, Kรถhler, Nicolas, Lakomov, Igor, Lannon, Kevin, Lassnig, Mario, Limosani, Antonio, Louppe, Gilles, Mangu, Aashrita, Mato, Pere, Meenakshi, Narain, Meinhard, Helge, Menasce, Dario, Moneta, Lorenzo, Moortgat, Seth, Neubauer, Mark, Newman, Harvey, Pabst, Hans, Paganini, Michela, Paulini, Manfred, Perdue, Gabriel, Perez, Uzziel, Picazio, Attilio, Pivarski, Jim, Prosper, Harrison, Psihas, Fernanda, Radovic, Alexander, Reece, Ryan, Rinkevicius, Aurelius, Rodrigues, Eduardo, Rorie, Jamal, Rousseau, David, Sauers, Aaron, Schramm, Steven, Schwartzman, Ariel, Severini, Horst, Seyfert, Paul, Siroky, Filip, Skazytkin, Konstantin, Sokoloff, Mike, Stewart, Graeme, Stienen, Bob, Stockdale, Ian, Strong, Giles, Thais, Savannah, Tomko, Karen, Upfal, Eli, Usai, Emanuele, Ustyuzhanin, Andrey, Vala, Martin, Vallecorsa, Sofia, Verzetti, Mauro, Vilasรญs-Cardona, Xavier, Vlimant, Jean-Roch, Vukotic, Ilija, Wang, Sean-Jiun, Watts, Gordon, Williams, Michael, Wu, Wenjing, Wunsch, Stefan, Zapata, Omar
The main objectives of particle physics in the post-Higgs boson discovery era is to exploit the full physics potential of both the Large Hadron Collider (LHC) and its upgrade, the high luminosity LHC (HL-LHC), in addition to present and future neutrino experiments. The HL-LHC will deliver data from 100 times the luminosity compared to the LHC, bringing quantitatively and qualitatively new challenges due to event size, data volume, and complexity. The physics reach of the experiments will be limited by the physics performance of algorithms and computational resources. Machine learning (ML) applied to particle physics promises to provide improvements in both of these areas. Incorporating machine learning in particle physics workflows will require significant research and development over the next five years. Areas where significant improvements are needed include: - Physics performance of reconstruction and analysis algorithms; - Execution time of computationally expensive parts of event simulation, pattern recognition, and calibration; - Realtime implementation of machine learning algorithms; - Reduction of the data footprint with data compression, placement and access.
The modal age of Statistics
The mean-median-mode trio involves the three most frequently used measures of central tendency of a dataset. They are taught within the very first classes of any course on basic Statistics. However, they do not share the same degree of importance: the sample mean (or average) is normally well understood and employed in everyday situations, the sample median is also useful and easy to visualize, but the mode, usually defined as the value of the dataset having the highest frequency of appearance, looks like a more bizarre measure of location. This uneven treatment was already noted by Dalenius (1965), but it keeps being present as of today, to some extent. Indeed, when the dataset consists of realizations from a continuous random variable then all the observed values are different with probability one and, therefore, the mode does not even make much sense.
A Tutorial on Bayesian Optimization
Bayesian optimization is an approach to optimizing objective functions that take a long time (minutes or hours) to evaluate. It is best-suited for optimization over continuous domains of less than 20 dimensions, and tolerates stochastic noise in function evaluations. It builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample. In this tutorial, we describe how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient. We then discuss more advanced techniques, including running multiple function evaluations in parallel, multi-fidelity and multi-information source optimization, expensive-to-evaluate constraints, random environmental conditions, multi-task Bayesian optimization, and the inclusion of derivative information. We conclude with a discussion of Bayesian optimization software and future research directions in the field. Within our tutorial material we provide a generalization of expected improvement to noisy evaluations, beyond the noise-free setting where it is more commonly applied. This generalization is justified by a formal decision-theoretic argument, standing in contrast to previous ad hoc modifications.
BALSON: Bayesian Least Squares Optimization with Nonnegative L1-Norm Constraint
Xie, Jiyang, Ma, Zhanyu, Zhang, Guoqiang, Xue, Jing-Hao, Chien, Jen-Tzung, Lin, Zhiqing, Guo, Jun
A Bayesian approach termed BAyesian Least Squares Optimization with Nonnegative L1-norm constraint (BALSON) is proposed. The error distribution of data fitting is described by Gaussian likelihood. The parameter distribution is assumed to be a Dirichlet distribution. With the Bayes rule, searching for the optimal parameters is equivalent to finding the mode of the posterior distribution. In order to explicitly characterize the nonnegative L1-norm constraint of the parameters, we further approximate the true posterior distribution by a Dirichlet distribution. We estimate the statistics of the approximating Dirichlet posterior distribution by sampling methods. Four sampling methods have been introduced. With the estimated posterior distributions, the original parameters can be effectively reconstructed in polynomial fitting problems, and the BALSON framework is found to perform better than conventional methods.
Improving Deep Learning through Automatic Programming
Deep learning and deep architectures are emerging as the best machine learning methods so far in many practical applications such as reducing the dimensionality of data, image classification, speech recognition or object segmentation.... In fact, many leading technology companies such as Google, Microsoft or IBM are researching and using deep architectures in their systems to replace other traditional models. Therefore, improving the performance of these models could make a very strong impact in the area of machine learning. However, deep learning is a very fast-growing research domain with many core methodologies and paradigms just discovered over the last few years. This thesis will first serve as a short summary of deep learning, which tries to include all of the most important ideas in this research area. Based on this knowledge, we suggested, and conducted some experiments to investigate the possibility of improving the deep learning based on automatic programming (ADATE). Although our experiments did produce good results, there are still many more possibilities that we could not try due to limited time as well as some limitations of the current ADATE version. I hope that this thesis can promote future work on this topic, especially when the next version of ADATE comes out. This thesis also includes a short analysis of the power of ADATE system, which could be very useful for other researchers who want to know what it is capable of.