Goto

Collaborating Authors

 Bayesian Inference


Augmenting and Tuning Knowledge Graph Embeddings

arXiv.org Artificial Intelligence

Knowledge graph embeddings rank among the most successful methods for link prediction in knowledge graphs, i.e., the task of completing an incomplete collection of relational facts. A downside of these models is their strong sensitivity to model hyperparameters, in particular regularizers, which have to be extensively tuned to reach good performance [Kadlec et al., 2017]. We propose an efficient method for large scale hyperparameter tuning by interpreting these models in a probabilistic framework. After a model augmentation that introduces per-entity hyperparameters, we use a variational expectation-maximization approach to tune thousands of such hyperparameters with minimal additional cost. Our approach is agnostic to details of the model and results in a new state of the art in link prediction on standard benchmark data.


Coupling techniques for nonlinear ensemble filtering

arXiv.org Machine Learning

We consider filtering in high-dimensional non-Gaussian state-space models with intractable transition kernels, nonlinear and possibly chaotic dynamics, and sparse observations in space and time. We propose a novel filtering methodology that harnesses transportation of measures, convex optimization, and ideas from probabilistic graphical models to yield robust ensemble approximations of the filtering distribution in high dimensions. Our approach can be understood as the natural generalization of the ensemble Kalman filter (EnKF) to nonlinear updates, using stochastic or deterministic couplings. The use of nonlinear updates can reduce the intrinsic bias of the EnKF at a marginal increase in computational cost. We avoid any form of importance sampling and introduce non-Gaussian localization approaches for dimension scalability. Our framework achieves state-of-the-art tracking performance on challenging configurations of the Lorenz-96 model in the chaotic regime.


Estimating Treatment Effect under Additive Hazards Models with High-dimensional Covariates

arXiv.org Machine Learning

Estimating causal effects for survival outcomes in the high-dimensional setting is an extremely important topic for many biomedical applications as well as areas of social sciences. We propose a new orthogonal score method for treatment effect estimation and inference that results in asymptotically valid confidence intervals assuming only good estimation properties of the hazard outcome model and the conditional probability of treatment. This guarantee allows us to provide valid inference for the conditional treatment effect under the high-dimensional additive hazards model under considerably more generality than existing approaches. In addition, we develop a new Hazards Difference (HDi), estimator. We showcase that our approach has double-robustness properties in high dimensions: with cross-fitting, the HDi estimate is consistent under a wide variety of treatment assignment models; the HDi estimate is also consistent when the hazards model is misspecified and instead the true data generating mechanism follows a partially linear additive hazards model. We further develop a novel sparsity doubly robust result, where either the outcome or the treatment model can be a fully dense high-dimensional model. We apply our methods to study the treatment effect of radical prostatectomy versus conservative management for prostate cancer patients using the SEER-Medicare Linked Data.


Deep Gamblers: Learning to Abstain with Portfolio Theory

arXiv.org Machine Learning

We deal with the \textit{selective classification} problem (supervised-learning problem with a rejection option), where we want to achieve the best performance at a certain level of coverage of the data. We transform the original $m$-class classification problem to $(m+1)$-class where the $(m+1)$-th class represents the model abstaining from making a prediction due to uncertainty. Inspired by portfolio theory, we propose a loss function for the selective classification problem based on the doubling rate of gambling. We show that minimizing this loss function has a natural interpretation as maximizing the return of a \textit{horse race}, where a player aims to balance between betting on an outcome (making a prediction) when confident and reserving one's winnings (abstaining) when not confident. This loss function allows us to train neural networks and characterize the uncertainty of prediction in an end-to-end fashion. In comparison with previous methods, our method requires almost no modification to the model inference algorithm or neural architecture. Experimentally, we show that our method can identify both uncertain and outlier data points, and achieves strong results on SVHN and CIFAR10 at various coverages of the data.


Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

arXiv.org Machine Learning

Hierarchical probabilistic models are able to use a large number of parameters to create a model with a high representation power. However, it is well known that increasing the number of parameters also increases the complexity of the model which leads to a bias-variance trade-off. Although it is a classical problem, the bias-variance trade-off between hidden layers and higher-order interactions have not been well studied. In our study, we propose an efficient inference algorithm for the log-linear formulation of the higher-order Boltzmann machine using a combination of Gibbs sampling and annealed importance sampling. We then perform a bias-variance decomposition to study the differences in hidden layers and higher-order interactions. Our results have shown that using hidden layers and higher-order interactions have a comparable error with a similar order of magnitude and using higher-order interactions produce less variance for smaller sample size.


Learning Markov models via low-rank optimization

arXiv.org Machine Learning

Modeling unknown systems from data is a precursor of system optimization and sequential decision making. In this paper, we focus on learning a Markov model from a single trajectory of states. Suppose that the transition model has a small rank despite of a large state space, meaning that the system admits a low-dimensional latent structure. We show that one can estimate the full transition model accurately using a trajectory of length that is proportional to the total number of states. We propose two maximum likelihood estimation methods: a convex approach with nuclear-norm regularization and a nonconvex approach with rank constraint. We show that both estimators enjoy optimal statistical rates in terms of the Kullback-Leiber divergence and the $\ell_2$ error. For computing the nonconvex estimator, we develop a novel DC (difference of convex function) programming algorithm that starts with the convex M-estimator and then successively refines the solution till convergence. Empirical experiments demonstrate consistent superiority of the nonconvex estimator over the convex one.


Consensus Monte Carlo for Random Subsets using Shared Anchors

arXiv.org Machine Learning

We present a consensus Monte Carlo algorithm that scales existing Bayesian nonparametric models for clustering and feature allocation to big data. The algorithm is valid for any prior on random subsets such as partitions and latent feature allocation, under essentially any sampling model. Motivated by three case studies, we focus on clustering induced by a Dirichlet process mixture sampling model, inference under an Indian buffet process prior with a binomial sampling model, and with a categorical sampling model. We assess the proposed algorithm with simulation studies and show results for inference with three datasets: an MNIST image dataset, a dataset of pancreatic cancer mutations, and a large set of electronic health records (EHR). Supplementary materials for this article are available online.


Uncertainty Estimates for Ordinal Embeddings

arXiv.org Machine Learning

To investigate objects without a describable notion of distance, one can gather ordinal information by asking triplet comparisons of the form "Is object $x$ closer to $y$ or is $x$ closer to $z$?" In order to learn from such data, the objects are typically embedded in a Euclidean space while satisfying as many triplet comparisons as possible. In this paper, we introduce empirical uncertainty estimates for standard embedding algorithms when few noisy triplets are available, using a bootstrap and a Bayesian approach. In particular, simulations show that these estimates are well calibrated and can serve to select embedding parameters or to quantify uncertainty in scientific applications.


A Simultaneous Transformation and Rounding Approach for Modeling Integer-Valued Data

arXiv.org Machine Learning

Integer-valued and count data are ubiquitous in many fields, including epidemiology (Osthus et al., 2018; Kowal, 2019), ecology (Dorazio et al., 2005), and insurance (Bening and Korolev, 2012), among others (Cameron and Trivedi, 2013). Count data also serve as an indicator of demand, such as the demand for medical services (Deb and Trivedi, 1997), emergency medical services (Matteson et al., 2011), and call center access (Shen and Huang, 2008). In these applications and many others, integer-valued data are frequently observed jointly with predictors, over time intervals, or across spatial locations. Integer-valued data also exhibit a variety of distributional features, including zero-inflation, skewness, over-or underdispersion, and in some cases may be bounded or censored. Flexible and interpretable models for integervalued processes are therefore highly useful in practice. The most widely-used models for count data build upon the Poisson distribution. However, the limitations of the Poisson distribution are well-known: the distribution is not sufficiently flexible in practice and cannot account for zero-inflation or over-and underdispersion. A common strategy is to generalize the Poisson model by introducing additional parameters.


Modeling Food Popularity Dependencies using Social Media data

arXiv.org Machine Learning

The rise in popularity of major social media platforms have enabled people to share photos and textual information about their daily life. One of the popular topics about which information is shared is food. Since a lot of media about food are attributed to particular locations and restaurants, information like popularity of spatio-temporal popularity of various cuisines can be analysed. Tracking the popularity of food types and retail locations across space and time can also be useful for business owners and restaurant investors. In this work, we present an approach using off-the shelf machine learning techniques to identify trends and popularity of cuisine types in an area using geo-tagged data from social media, Google images and Yelp. After adjusting for time, we use the Kernel Density Estimation to get hot spots across the location and model the dependencies among food cuisines popularity using Bayesian Networks. We consider the Manhattan borough of New York City as the location for our analyses but the approach can be used for any area with social media data and information about retail businesses.