Directed Networks
Estimating latent feature-feature interactions in large feature-rich graphs
Real-world complex networks describe connections between objects; in reality, those objects are often endowed with some kind of features. How does the presence or absence of such features interplay with the network link structure? Although the situation here described is truly ubiquitous, there is a limited body of research dealing with large graphs of this kind. Many previous works considered homophily as the only possible transmission mechanism translating node features into links. Other authors, instead, developed more sophisticated models, that are able to handle complex feature interactions, but are unfit to scale to very large networks. We expand on the MGJ model, where interactions between pairs of features can foster or discourage link formation. In this work, we will investigate how to estimate the latent feature-feature interactions in this model. We shall propose two solutions: the first one assumes feature independence and it is essentially based on Naive Bayes; the second one, which relaxes the independence assumption assumption, is based on perceptrons. In fact, we show it is possible to cast the model equation in order to see it as the prediction rule of a perceptron. We analyze how classical results for the perceptrons can be interpreted in this context; then, we define a fast and simple perceptron-like algorithm for this task, which can process $10^8$ links in minutes. We then compare these two techniques, first with synthetic datasets that follows our model, gaining evidence that the Naive independence assumptions are detrimental in practice. Secondly, we consider a real, large-scale citation network where each node (i.e., paper) can be described by different types of characteristics; there, our algorithm can assess how well each set of features can explain the links, and thus finding meaningful latent feature-feature interactions.
A GAMP Based Low Complexity Sparse Bayesian Learning Algorithm
Al-Shoukairi, Maher, Schniter, Philip, Rao, Bhaskar D.
Abstract--In this paper, we present an algorithm for the sparse signal recovery problem that incorporates damped Gaussian generalized approximate message passing (GGAMP) into Expectation-Maximization (EM)-based sparse Bayesian learning (SBL). In particular, GGAMP is used to implement the E-step in SBL in place of matrix inversion, leveraging the fact that GGAMP is guaranteed to converge with appropriate damping. The resulting GGAMP-SBL algorithm is much more robust to arbitrary measurement matrix A than the standard damped GAMP algorithm while being much lower complexity than the standard SBL algorithm. We then extend the approach from the single measurement vector (SMV) case to the temporally correlated multiple measurement vector (MMV) case, leading to the GGAMP-TSBL algorithm. We verify the robustness and computational advantages of the proposed algorithms through numerical experiments. The problem of sparse signal recovery (SSR) and the related problem of compressed sensing have received much attention in recent years [1]-[6]. Despite the difficulty in solving this problem [7], an important finding in recent years is that for a sufficiently sparse x and a well designed A, accurate recovery is possible by techniques such as basis pursuit and orthogonal matching pursuit [8]- [10]. The SSR problem has seen considerable advances on the algorithmic front and they include iteratively reweighted algorithms [11]-[13] and Bayesian techniques [14]-[20], among others. Two Bayesian techniques related to this work are the generalized approximate message passing (GAMP) and the sparse Bayesian learning (SBL) algorithms.
Correlated Equilibria for Approximate Variational Inference in MRFs
Ortiz, Luis E., Wang, Boshen, Gong, Ze
Almost all of the work in graphical models for game theory has mirrored previous work in probabilistic graphical models. Our work considers the opposite direction: Taking advantage of recent advances in equilibrium computation for probabilistic inference. We present formulations of inference problems in Markov random fields (MRFs) as computation of equilibria in a certain class of game-theoretic graphical models. We concretely establishes the precise connection between variational probabilistic inference in MRFs and correlated equilibria. No previous work exploits recent theoretical and empirical results from the literature on algorithmic and computational game theory on the tractable, polynomial-time computation of exact or approximate correlated equilibria in graphical games with arbitrary, loopy graph structure. We discuss how to design new algorithms with equally tractable guarantees for the computation of approximate variational inference in MRFs. Also, inspired by a previously stated game-theoretic view of state-of-the-art tree-reweighed (TRW) message-passing techniques for belief inference as zero-sum game, we propose a different, general-sum potential game to design approximate fictitious-play techniques. We perform synthetic experiments evaluating our proposed approximation algorithms with standard methods and TRW on several classes of classical Ising models (i.e., with binary random variables). We also evaluate the algorithms using Ising models learned from the MNIST dataset. Our experiments show that our global approach is competitive, particularly shinning in a class of Ising models with constant, "highly attractive" edge-weights, in which it is often better than all other alternatives we evaluated. With a notable exception, our more local approach was not as effective. Yet, in fairness, almost all of the alternatives are often no better than a simple baseline: estimate 0.5.
Stacked Structure Learning for Lifted Relational Neural Networks
Sourek, Gustav, Svatos, Martin, Zelezny, Filip, Schockaert, Steven, Kuzelka, Ondrej
Lifted Relational Neural Networks (LRNNs [15]) are weighted sets of first-order rules, which are used to construct feed-forward neural networks from relational structures. A central characteristic of LRNNs is that a different neural network is constructed for each learning example, but crucially, the weights of these different neural networks are shared. This allows LRNNs to use neural networks for learning in relational domains, despite the fact that training examples may vary considerably in size and structure. In previous work, LRNNs have been learned from handcrafted rules. In such cases, only the weights of the first-order rules have to be learned from training data, which can be accomplished using a variant of back-propagation. The use of handcrafted rules offers a natural way to incorporate domain knowledge in the learning process. In some applications, however, (sufficient) domain knowledge is lacking and both the rules and their weights have to be learned from data. To this end, in this paper we introduce a structure learning method for LRNNs. Our proposed structure learning method proceeds in an iterative fashion.
Learning Graphical Models from a Distributed Stream
Zhang, Yu, Tirthapura, Srikanta, Cormode, Graham
A current challenge for data management systems is to support the construction and maintenance of machine learning models over data that is large, multi-dimensional, and evolving. While systems that could support these tasks are emerging, the need to scale to distributed, streaming data requires new models and algorithms. In this setting, as well as computational scalability and model accuracy, we also need to minimize the amount of communication between distributed processors, which is the chief component of latency. We study Bayesian networks, the workhorse of graphical models, and present a communication-efficient method for continuously learning and maintaining a Bayesian network model over data that is arriving as a distributed stream partitioned across multiple processors. We show a strategy for maintaining model parameters that leads to an exponential reduction in communication when compared with baseline approaches to maintain the exact MLE (maximum likelihood estimation). Meanwhile, our strategy provides similar prediction errors for the target distribution and for classification tasks.
Bayesian Learning for Statistical Classification – Stats and Bots
A well-calibrated estimator for the conditional probabilities should obey this equation. Once we have derived a statistical classifier, we need to validate it on some test data. This data should be different from that used to train the classifier, otherwise skill scores will be unduly optimistic. This is known as cross-validation. The confusion matrix expresses everything about the accuracy of a discrete classifier over a given database and you can use it to compose any possible skill score. Here, we are going to cover two that are rarely seen in the literature, but are nonetheless important for reasons that will become clear.
Learning Functional Causal Models with Generative Neural Networks
Goudet, Olivier, Kalainathan, Diviyan, Caillou, Philippe, Lopez-Paz, David, Guyon, Isabelle, Sebag, Michèle, Tritas, Aris, Tubaro, Paola
We introduce a new approach to functional causal modeling from observational data. The approach, called Causal Generative Neural Networks (CGNN), leverages the power of neural networks to learn a generative model of the joint distribution of the observed variables, by minimizing the Maximum Mean Discrepancy between generated and observed data. An approximate learning criterion is proposed to scale the computational cost of the approach to linear complexity in the number of observations. The performance of CGNN is studied throughout three experiments. First, we apply CGNN to the problem of cause-effect inference, where two CGNNs model $P(Y|X,\textrm{noise})$ and $P(X|Y,\textrm{noise})$ identify the best causal hypothesis out of $X\rightarrow Y$ and $Y\rightarrow X$. Second, CGNN is applied to the problem of identifying v-structures and conditional independences. Third, we apply CGNN to problem of multivariate functional causal modeling: given a skeleton describing the dependences in a set of random variables $\{X_1, \ldots, X_d\}$, CGNN orients the edges in the skeleton to uncover the directed acyclic causal graph describing the causal structure of the random variables. On all three tasks, CGNN is extensively assessed on both artificial and real-world data, comparing favorably to the state-of-the-art. Finally, we extend CGNN to handle the case of confounders, where latent variables are involved in the overall causal model.
Steps Toward Robust Artificial Intelligence
Recent advances in artificial intelligence are encouraging governments and corporations to deploy AI in high-stakes settings including driving cars autonomously, managing the power grid, trading on stock exchanges, and controlling autonomous weapons systems. Such applications require AI methods to be robust to both the known unknowns (those uncertain aspects of the world about which the computer can reason explicitly) and the unknown unknowns (those aspects of the world that are not captured by the system’s models). This article discusses recent progress in AI and then describes eight ideas related to robustness that are being pursued within the AI research community. While these ideas are a start, we need to devote more attention to the challenges of dealing with the known and unknown unknowns. These issues are fascinating, because they touch on the fundamental question of how finite systems can survive and thrive in a complex and dangerous world
Estimating the Fundamental Limits is Easier than Achieving the Fundamental Limits
Jiao, Jiantao, Han, Yanjun, Fischer-Hwang, Irena, Weissman, Tsachy
Suppose there exist three machine learning experts that would like to understand the fundamental limits of classification (Bayes error) [1] for a specific dataset. Since the true distribution that generates the data is unknown, they take three different approaches: 1) Expert A: given empirical training samples, produce an estimate of the Bayes error that is (near) optimal statistically; 2) Expert B: construct a (near) optimal classifier based on the training sample, and then use its performance on the test set (may have infinite size) to estimate the Bayes error; 3) Expert C: use the training error of a (near) optimal classification algorithm to estimate the Bayes error. We ask the question: are there any fundamental differences between experts A, B, and C? Evidently, expert A is not constrained by any specific approaches as experts B and C are, but if B and C are using (near) optimal classification algorithms, would B or C achieve the same performance of A if A chooses to act optimally? Similar situations arise in the understanding of fundamental limits of data compression and sequential prediction under logarithmic loss, which is given by the Shannon entropy rate [2]. In this situation, there could exist four different experts: 1) A: would like to estimate the limits of compression (near) optimally; 2) B: would like to construct a predictor based on training samples and use its prediction accuracy under logarithmic loss on the test set (may have infinite size) to estimate the limits; 3) C: would like to use the training error of a (near) optimal sequential predictor to estimate the limits; 4) D: would like to construct a (near) optimal data compressor and use its normalized code length to estimate the limits. In this situation, are there any fundamental differences between the tasks of these four experts?
Bardo: Emotion-Based Music Recommendation for Tabletop Role-Playing Games
Padovani, Rafael R. (Universidade Federal de Viçosa) | Ferreira, Lucas N. (University of California, Santa Cruz) | Lelis, Levi H. S. (Universidade Federal de Viçosa)
In this paper we introduce Bardo, a real-time intelligent system to automatically select the background music for tabletop role-playing games. Bardo uses an off-the-shelf speech recognition system to transform into text what the players say during a game session, and a supervised learning algorithm to classify the text into an emotion. Bardo then selects and plays as background music a song representing the classified emotion. We evaluate Bardo with a Dungeons and Dragons (D&D) campaign available on YouTube. Accuracy experiments show that a simple Naive Bayes classifier is able to obtain good prediction accuracy in our classification task. A user study in which people evaluated edited versions of the D&D videos suggests that Bardo's selections can be better than those used in the original videos of the campaign.