Bayesian Inference
Structural Learning of Probabilistic Graphical Models of Cumulative Phenomena
Ramazzotti, Daniele, Nobile, Marco S., Antoniotti, Marco, Graudenzi, Alex
One of the critical issues when adopting Bayesian networks (BNs) to model dependencies among random variables is to "learn" their structure. This is a well-known NP-hard problem in its most general and classical formulation, which is furthermore complicated by known pitfalls such as the issue of I-equivalence among different structures. In this work we restrict the investigation to a specific class of networks, i.e., those representing the dynamics of phenomena characterized by the monotonic accumulation of events. Such phenomena allow to set specific structural constraints based on Suppes' theory of probabilistic causation and, accordingly, to define constrained BNs, named Suppes-Bayes Causal Networks (SBCNs). Within this framework, we study the structure learning of SBCNs via extensive simulations with various state-of-the-art search strategies, such as canonical local search techniques and Genetic Algorithms. This investigation is intended to be an extension and an in-depth clarification of our previous works on SBCN structure learning. Among the main results, we show that Suppes' constraints do simplify the learning task, by reducing the solution search space and providing a temporal ordering on the variables, which simplifies the complications derived by I-equivalent structures. Finally, we report on tradeoffs among different optimization techniques that can be used to learn SBCNs.
Nonparametric Bayesian label prediction on a large graph using truncated Laplacian regularization
Hartog, Jarno, van Zanten, Harry
This article describes an implementation of a nonparametric Bayesian approach to solving binary classification problems on graphs. We consider a hierarchical Bayesian approach with a prior that is constructed by truncating a series expansion of the soft label function using the graph Laplacian eigenfunctions as basis functions. We compare our truncated prior to the untruncated Laplacian based prior in simulated and real data examples to illustrate the improved scalability in terms of size of the underlying graph.
A Latent Gaussian Mixture Model for Clustering Longitudinal Data
Bierling, Vanessa S. E., McNicholas, Paul D.
Finite mixture models have become a popular tool for clustering. Amongst other uses, they have been applied for clustering longitudinal data and clustering high-dimensional data. In the latter case, a latent Gaussian mixture model is sometimes used. Although there has been much work on clustering using latent variables and on clustering longitudinal data, respectively, there has been a paucity of work that combines these features. An approach is developed for clustering longitudinal data with many time points based on an extension of the mixture of common factor analyzers model. A variation of the expectation-maximization algorithm is used for parameter estimation and the Bayesian information criterion is used for model selection. The approach is illustrated using real and simulated data.
Fast Counting in Machine Learning Applications
Karan, Subhadeep, Eichhorn, Matthew, Hurlburt, Blake, Iraci, Grant, Zola, Jaroslaw
We propose scalable methods to execute counting queries in machine learning applications. To achieve memory and computational efficiency, we abstract counting queries and their context such that the counts can be aggregated as a stream. We demonstrate performance and scalability of the resulting approach on random queries, and through extensive experimentation using Bayesian networks learning and association rule mining. Our methods significantly outperform commonly used ADtrees and hash tables, and are practical alternatives for processing large-scale data.
Fast Gaussian Process Based Gradient Matching for Parameter Identification in Systems of Nonlinear ODEs
Wenk, Philippe, Gotovos, Alkis, Bauer, Stefan, Gorbach, Nico, Krause, Andreas, Buhmann, Joachim M.
Parameter identification and comparison of dynamical systems is a challenging task in many fields. Bayesian approaches based on Gaussian process regression over time-series data have been successfully applied to infer the parameters of a dynamical system without explicitly solving it. While the benefits in computational cost are well established, a rigorous mathematical framework has been missing. We offer a novel interpretation which leads to a better understanding and improvements in state-of-the-art performance in terms of accuracy for nonlinear dynamical systems.
Solving Bongard Problems with a Visual Language and Pragmatic Reasoning
Depeweg, Stefan, Rothkopf, Constantin A., Jäkel, Frank
More than 50 years ago Bongard introduced 100 visual concept learning problems as a testbed for intelligent vision systems. These problems are now known as Bongard problems. Although they are well known in the cognitive science and AI communities only moderate progress has been made towards building systems that can solve a substantial subset of them. In the system presented here, visual features are extracted through image processing and then translated into a symbolic visual vocabulary. We introduce a formal language that allows representing complex visual concepts based on this vocabulary. Using this language and Bayesian inference, complex visual concepts can be induced from the examples that are provided in each Bongard problem. Contrary to other concept learning problems the examples from which concepts are induced are not random in Bongard problems, instead they are carefully chosen to communicate the concept, hence requiring pragmatic reasoning. Taking pragmatic reasoning into account we find good agreement between the concepts with high posterior probability and the solutions formulated by Bongard himself. While this approach is far from solving all Bongard problems, it solves the biggest fraction yet.
Towards Training Probabilistic Topic Models on Neuromorphic Multi-chip Systems
Xiao, Zihao, Chen, Jianfei, Zhu, Jun
Probabilistic topic models are popular unsupervised learning methods, including probabilistic latent semantic indexing (pLSI) and latent Dirichlet allocation (LDA). By now, their training is implemented on general purpose computers (GPCs), which are flexible in programming but energy-consuming. Towards low-energy implementations, this paper investigates their training on an emerging hardware technology called the neuromorphic multi-chip systems (NMSs). NMSs are very effective for a family of algorithms called spiking neural networks (SNNs). We present three SNNs to train topic models. The first SNN is a batch algorithm combining the conventional collapsed Gibbs sampling (CGS) algorithm and an inference SNN to train LDA. The other two SNNs are online algorithms targeting at both energy- and storage-limited environments. The two online algorithms are equivalent with training LDA by using maximum-a-posterior estimation and maximizing the semi-collapsed likelihood, respectively. They use novel, tailored ordinary differential equations for stochastic optimization. We simulate the new algorithms and show that they are comparable with the GPC algorithms, while being suitable for NMS implementation. We also propose an extension to train pLSI and a method to prune the network to obey the limited fan-in of some NMSs.
CoT: Cooperative Training for Generative Modeling
Lu, Sidi, Yu, Lantao, Zhang, Weinan, Yu, Yong
We propose Cooperative Training (CoT) for training generative models that measure a tractable density function for target data. CoT coordinately trains a generator $G$ and an auxiliary predictive mediator $M$. The training target of $M$ is to estimate a mixture density of the learned distribution $G$ and the target distribution $P$, and that of $G$ is to minimize the Jensen-Shannon divergence estimated through $M$. CoT achieves independent success without the necessity of pre-training via Maximum Likelihood Estimation or involving high-variance algorithms like REINFORCE. This low-variance algorithm is theoretically proved to be unbiased for both generative and predictive tasks. We also theoretically and empirically show the superiority of CoT over most previous algorithms, in terms of generative quality and diversity, predictive generalization ability and computational cost.
Multimodal Sparse Bayesian Dictionary Learning
Fedorov, Igor, Rao, Bhaskar D.
The purpose of this paper is to address the problem of learning dictionaries for multimodal datasets, i.e. datasets collected from multiple data sources. We present an algorithm called multimodal sparse Bayesian dictionary learning (MSBDL). The MSBDL algorithm is able to leverage information from all available data modalities through a joint sparsity constraint on each modality's sparse codes without restricting the coefficients themselves to be equal. Our framework offers a considerable amount of flexibility to practitioners and addresses many of the shortcomings of existing multimodal dictionary learning approaches. Unlike existing approaches, MSBDL allows the dictionaries for each data modality to have different cardinality. In addition, MSBDL can be used in numerous scenarios, from small datasets to extensive datasets with large dimensionality. MSBDL can also be used in supervised settings and allows for learning multimodal dictionaries concurrently with classifiers for each modality.
A review of possible effects of cognitive biases on interpretation of rule-based machine learning models
Kliegr, Tomáš, Bahník, Štěpán, Fürnkranz, Johannes
This paper investigates to what extent do cognitive biases affect human understanding of interpretable machine learning models, in particular of rules discovered from data. Twenty cognitive biases (illusions, effects) are covered, as are possibly effective debiasing techniques that can be adopted by designers of machine learning algorithms and software. While there seems no universal approach for eliminating all the identified cognitive biases, it follows from our analysis that the effect of most biases can be ameliorated by making rule-based models more concise. Due to lack of previous research, our review transfers general results obtained in cognitive psychology to the domain of machine learning. It needs to be succeeded by empirical studies specifically aimed at the machine learning domain.