Directed Networks
Regression from Dependent Observations
Daskalakis, Constantinos, Dikkala, Nishanth, Panageas, Ioannis
The standard linear and logistic regression models assume that the response variables are independent, but share the same linear relationship to their corresponding vectors of covariates. The assumption that the response variables are independent is, however, too strong. In many applications, these responses are collected on nodes of a network, or some spatial or temporal domain, and are dependent. Examples abound in financial and meteorological applications, and dependencies naturally arise in social networks through peer effects. Regression with dependent responses has thus received a lot of attention in the Statistics and Economics literature, but there are no strong consistency results unless multiple independent samples of the vectors of dependent responses can be collected from these models. We present computationally and statistically efficient methods for linear and logistic regression models when the response variables are dependent on a network. Given one sample from a networked linear or logistic regression model and under mild assumptions, we prove strong consistency results for recovering the vector of coefficients and the strength of the dependencies, recovering the rates of standard regression under independent observations. We use projected gradient descent on the negative log-likelihood, or negative log-pseudolikelihood, and establish their strong convexity and consistency using concentration of measure for dependent random variables.
Multi-fidelity classification using Gaussian processes: accelerating the prediction of large-scale computational models
Costabal, Francisco Sahli, Perdikaris, Paris, Kuhl, Ellen, Hurtado, Daniel E.
Machine learning techniques typically rely on large datasets to create accurate classifiers. However, there are situations when data is scarce and expensive to acquire. This is the case of studies that rely on state-of-the-art computational models which typically take days to run, thus hindering the potential of machine learning tools. In this work, we present a novel classifier that takes advantage of lower fidelity models and inexpensive approximations to predict the binary output of expensive computer simulations. We postulate an autoregressive model between the different levels of fidelity with Gaussian process priors. We adopt a fully Bayesian treatment for the hyper-parameters and use Markov Chain Mont Carlo samplers. We take advantage of the probabilistic nature of the classifier to implement active learning strategies. We also introduce a sparse approximation to enhance the ability of themulti-fidelity classifier to handle large datasets. We test these multi-fidelity classifiers against their single-fidelity counterpart with synthetic data, showing a median computational cost reduction of 23% for a target accuracy of 90%. In an application to cardiac electrophysiology, the multi-fidelity classifier achieves an F1 score, the harmonic mean of precision and recall, of 99.6% compared to 74.1% of a single-fidelity classifier when both are trained with 50 samples. In general, our results show that the multi-fidelity classifiers outperform their single-fidelity counterpart in terms of accuracy in all cases. We envision that this new tool will enable researchers to study classification problems that would otherwise be prohibitively expensive. Source code is available at https://github.com/fsahli/MFclass.
Importance Weighted Hierarchical Variational Inference
Sobolev, Artem, Vetrov, Dmitry
Variational Inference is a powerful tool in the Bayesian modeling toolkit, however, its effectiveness is determined by the expressivity of the utilized variational distributions in terms of their ability to match the true posterior distribution. In turn, the expressivity of the variational family is largely limited by the requirement of having a tractable density function. To overcome this roadblock, we introduce a new family of variational upper bounds on a marginal log density in the case of hierarchical models (also known as latent variable models). We then give an upper bound on the Kullback-Leibler divergence and derive a family of increasingly tighter variational lower bounds on the otherwise intractable standard evidence lower bound for hierarchical variational distributions, enabling the use of more expressive approximate posteriors. We show that previously known methods, such as Hierarchical Variational Models, Semi-Implicit Variational Inference and Doubly Semi-Implicit Variational Inference can be seen as special cases of the proposed approach, and empirically demonstrate superior performance of the proposed method in a set of experiments.
Meta-learning of Sequential Strategies
Ortega, Pedro A., Wang, Jane X., Rowland, Mark, Genewein, Tim, Kurth-Nelson, Zeb, Pascanu, Razvan, Heess, Nicolas, Veness, Joel, Pritzel, Alex, Sprechmann, Pablo, Jayakumar, Siddhant M., McGrath, Tom, Miller, Kevin, Azar, Mohammad, Osband, Ian, Rabinowitz, Neil, Gyรถrgy, Andrรกs, Chiappa, Silvia, Osindero, Simon, Teh, Yee Whye, van Hasselt, Hado, de Freitas, Nando, Botvinick, Matthew, Legg, Shane
In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal predictors and reinforcement learners which behave as if they had a probabilistic model that allowed them to efficiently exploit task structure. Furthermore, we recast memory-based meta-learning within a Bayesian framework, showing that the meta-learned strategies are near-optimal because they amortize Bayes-filtered data, where the adaptation is implemented in the memory dynamics as a state-machine of sufficient statistics. Essentially, memory-based meta-learning translates the hard problem of probabilistic sequential inference into a regression problem.
Unified Adversarial Invariance
Jaiswal, Ayush, Wu, Yue, AbdAlmageed, Wael, Natarajan, Premkumar
We present a unified invariance framework for supervised neural networks that can induce independence to nuisance factors of data without using any nuisance annotations, but can additionally use labeled information about biasing factors to force their removal from the latent embedding for making fair predictions. Invariance to nuisance is achieved by learning a split representation of data through competitive training between the prediction task and a reconstruction task coupled with disentanglement, whereas that to biasing factors is brought about by penalizing the network if the latent embedding contains any information about them. We describe an adversarial instantiation of this framework and provide analysis of its working. Our model outperforms previous works at inducing invariance to nuisance factors without using any labeled information about such variables, and achieves state-of-the-art performance at learning independence to biasing factors in fairness settings.
Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review
Ghojogh, Benyamin, Samad, Maria N., Mashhadi, Sayema Asif, Kapoor, Tania, Ali, Wahab, Karray, Fakhri, Crowley, Mark
Pattern analysis often requires a pre-processing stage for extracting or selecting features in order to help the classification, prediction, or clustering stage discriminate or represent the data in a better way. The reason for this requirement is that the raw data are complex and difficult to process without extracting or selecting appropriate features beforehand. This paper reviews theory and motivation of different common methods of feature selection and extraction and introduces some of their applications. Some numerical implementations are also shown for these methods. Finally, the methods in feature selection and extraction are compared.
Interpretable Outcome Prediction with Sparse Bayesian Neural Networks in Intensive Care
Popkes, Anna-Lena, Overweg, Hiske, Ercole, Ari, Li, Yingzhen, Hernรกndez-Lobato, Josรฉ Miguel, Zaykov, Yordan, Zhang, Cheng
Clinical decision making is challenging because of pathological complexity, as well as large amounts of heterogeneous data generated as part of routine clinical care. In recent years, machine learning tools have been developed to aid this process. Intensive care unit (ICU) admissions represent the most data dense and time-critical patient care episodes. In this context, prediction models may help clinicians determine which patients are most at risk and prioritize care. However, flexible tools such as artificial neural networks (ANNs) suffer from a lack of interpretability limiting their acceptability to clinicians. In this work, we propose a novel interpretable Bayesian neural network architecture which offers both the flexibility of ANNs and interpretability in terms of feature selection. In particular, we employ a sparsity inducing prior distribution in a tied manner to learn which features are important for outcome prediction. We evaluate our approach on the task of mortality prediction using two real-world ICU cohorts. In collaboration with clinicians we found that, in addition to the predicted outcome results, our approach can provide novel insights into the importance of different clinical measurements. This suggests that our model can support medical experts in their decision making process.
Learning Clique Forests
Massara, Guido Previde, Aste, Tomaso
We propose a topological learning algorithm for the estimation of the conditional dependency structure of large sets of random variables from sparse and noisy data. The algorithm, named Maximally Filtered Clique Forest (MFCF), produces a clique forest and an associated Markov Random Field (MRF) by generalising Prim's minimum spanning tree algorithm. To the best of our knowledge, the MFCF presents three elements of novelty with respect to existing structure learning approaches. The first is the repeated application of a local topological move, the clique expansion, that preserves the decomposability of the underlying graph. Through this move the decomposability and calculation of scores is performed incrementally at the variable (rather than edge) level, and this provides better computational performance and an intuitive application of multivariate statistical tests. The second is the capability to accommodate a variety of score functions and, while this paper is focused on multivariate normal distributions, it can be directly generalised to different types of statistics. Finally, the third is the variable range of allowed clique sizes which is an adjustable topological constraint that acts as a topological penalizer providing a way to tackle sparsity at $l_0$ semi-norm level; this allows a clean decoupling of structure learning and parameter estimation. The MFCF produces a representation of the clique forest, together with a perfect ordering of the cliques and a perfect elimination ordering for the vertices. As an example we propose an application to covariance selection models and we show that the MCFC outperforms the Graphical Lasso for a number of classes of matrices.
Direct Estimation of Position Bias for Unbiased Learning-to-Rank without Intervention
Aslanyan, Grigor, Porwal, Utkarsh
The Unbiased Learning-to-Rank framework has been recently proposed as a general approach to systematically remove biases, such as position bias, from learning-to-rank models. The method takes two steps - estimating click propensities and using them to train unbiased models. Most common methods proposed in the literature for estimating propensities involve some degree of intervention in the live search engine. An alternative approach proposed recently uses an Expectation Maximization (EM) algorithm to estimate propensities by using ranking features for estimating relevances. In this work we propose a novel method to directly estimate propensities which does not use any intervention in live search or rely on modeling relevance. Rather, we take advantage of the fact that the same query-document pair may naturally change ranks over time. This typically occurs for eCommerce search because of change of popularity of items over time, existence of time dependent ranking features, or addition or removal of items to the index (an item getting sold or a new item being listed). However, our method is general and can be applied to any search engine for which the rank of the same document may naturally change over time for the same query. We derive a simple likelihood function that depends on propensities only, and by maximizing the likelihood we are able to get estimates of the propensities. We apply this method to eBay search data to estimate click propensities for web and mobile search and compare these with estimates using the EM method. We also use simulated data to show that the method gives reliable estimates of the "true" simulated propensities. Finally, we train an unbiased learning-to-rank model for eBay search using the estimated propensities and show that it outperforms both baselines - one without position bias correction and one with position bias correction using the EM method.
Conditional Generative Neural System for Probabilistic Trajectory Prediction
Li, Jiachen, Ma, Hengbo, Tomizuka, Masayoshi
Effective understanding of the environment and accurate trajectory prediction of surrounding dynamic obstacles are critical for intelligent systems such as autonomous vehicles and wheeled mobile robotics navigating in complex scenarios to achieve safe and high-quality decision making, motion planning and control. Due to the uncertain nature of the future, it is desired to make inference from a probability perspective instead of deterministic prediction. In this paper, we propose a conditional generative neural system (CGNS) for probabilistic trajectory prediction to approximate the data distribution, with which realistic, feasible and diverse future trajectory hypotheses can be sampled. The system combines the strengths of conditional latent space learning and variational divergence minimization, and leverages both static context and interaction information with soft attention mechanisms. We also propose a regularization method for incorporating soft constraints into deep neural networks with differentiable barrier functions, which can regulate and push the generated samples into the feasible regions. The proposed system is evaluated on several public benchmark datasets for pedestrian trajectory prediction and a roundabout naturalistic driving dataset collected by ourselves. The experiment results demonstrate that our model achieves better performance than various baseline approaches in terms of prediction accuracy.