Optimization
LP-SparseMAP: Differentiable Relaxed Optimization for Sparse Structured Prediction
Niculae, Vlad, Martins, André F. T.
Structured prediction requires manipulating a large number of combinatorial structures, e.g., dependency trees or alignments, either as latent or output variables. Recently, the SparseMAP method has been proposed as a differentiable, sparse alternative to maximum a posteriori (MAP) and marginal inference. SparseMAP returns a combination of a small number of structures, a desirable property in some downstream applications. However, SparseMAP requires a tractable MAP inference oracle. This excludes, e.g., loopy graphical models or factor graphs with logic constraints, which generally require approximate inference. In this paper, we introduce LP-SparseMAP, an extension of SparseMAP that addresses this limitation via a local polytope relaxation. LP-SparseMAP uses the flexible and powerful domain specific language of factor graphs for defining and backpropagating through arbitrary hidden structure, supporting coarse decompositions, hard logic constraints, and higher-order correlations. We derive the forward and backward algorithms needed for using LP-SparseMAP as a hidden or output layer. Experiments in three structured prediction tasks show benefits compared to SparseMAP and Structured SVM.
Accelerating Block Coordinate Descent for Nonnegative Tensor Factorization
Ang, Andersen Man Shun, Cohen, Jeremy E., Gillis, Nicolas, Hien, Le Thi Khanh
A N -way array or N -th order tensor T is a multidimensional array in the product R I 1 ... I N of the vector spaces R I i for i 1, 2,...,N . A vector x R I 1 is a first-order tensor, and a matrix M R I 1 I 2 is a second-order tensor. The goal of NTF is to approximate a tensor T by a structured tensor X . Using the squared Frobenius norm as a distance metric, defined as nullXnull 2 F null j 1,j 2,...j NX 2 j 1j 2...j N, NTF is the following optimization problem: min a (i) p 0, 1 i N, 1 p r null null null null null nullT r null p 1 N null i 1a (i) p null null null null null null 2 F, (1) This work was supported by the Fonds de la Recherche Scientifique - FNRS and the Fonds Wetenschappelijk Onderzoek - Vlanderen (FWO) under EOS Project no O005318F-RG47, and by the European Research Council (ERC starting grant no 679515).
Exploiting Database Management Systems and Treewidth for Counting
Fichte, Johannes K., Hecher, Markus, Thier, Patrick, Woltran, Stefan
Bounded treewidth is one of the most cited combinatorial invariants, which was applied in the literature for solving several counting problems efficiently. A canonical counting problem is #SAT, which asks to count the satisfying assignments of a Boolean formula. Recent work shows that benchmarking instances for #SAT often have reasonably small treewidth. This paper deals with counting problems for instances of small treewidth. We introduce a general framework to solve counting questions based on state-of-the-art database management systems (DBMS). Our framework takes explicitly advantage of small treewidth by solving instances using dynamic programming (DP) on tree decompositions (TD). Therefore, we implement the concept of DP into a DBMS (PostgreSQL), since DP algorithms are already often given in terms of table manipulations in theory. This allows for elegant specifications of DP algorithms and the use of SQL to manipulate records and tables, which gives us a natural approach to bring DP algorithms into practice. To the best of our knowledge, we present the first approach to employ a DBMS for algorithms on TDs. A key advantage of our approach is that DBMS naturally allow to deal with huge tables with a limited amount of main memory (RAM), parallelization, as well as suspending computation.
Joint Reasoning for Multi-Faceted Commonsense Knowledge
Chalier, Yohan, Razniewski, Simon, Weikum, Gerhard
Commonsense knowledge (CSK) supports a variety of AI applications, from visual understanding to chatbots. Prior works on acquiring CSK, such as ConceptNet, have compiled statements that associate concepts, like everyday objects or activities, with properties that hold for most or some instances of the concept. Each concept is treated in isolation from other concepts, and the only quantitative measure (or ranking) of properties is a confidence score that the statement is valid. This paper aims to overcome these limitations by introducing a multi-faceted model of CSK statements and methods for joint reasoning over sets of inter-related statements. Our model captures four different dimensions of CSK statements: plausibility, typicality, remarkability and salience, with scoring and ranking along each dimension. For example, hyenas drinking water is typical but not salient, whereas hyenas eating carcasses is salient. For reasoning and ranking, we develop a method with soft constraints, to couple the inference over concepts that are related in in a taxonomic hierarchy. The reasoning is cast into an integer linear programming (ILP), and we leverage the theory of reduction costs of a relaxed LP to compute informative rankings. This methodology is applied to several large CSK collections. Our evaluation shows that we can consolidate these inputs into much cleaner and more expressive knowledge. Results are available at https://dice.mpi-inf.mpg.de.
Bayesian Quantile and Expectile Optimisation
Torossian, Léonard, Picheny, Victor, Durrande, Nicolas
Bayesian optimisation is widely used to optimise stochastic black box functions. While most strategies are focused on optimising conditional expectations, a large variety of applications require risk-averse decisions and alternative criteria accounting for the distribution tails need to be considered. In this paper, we propose new variational models for Bayesian quantile and expectile regression that are well-suited for heteroscedastic settings. Our models consist of two latent Gaussian processes accounting respectively for the conditional quantile (or expectile) and variance that are chained through asymmetric likelihood functions. Furthermore, we propose two Bayesian optimisation strategies, either derived from a GP-UCB or Thompson sampling, that are tailored to such models and that can accommodate large batches of points. As illustrated in the experimental section, the proposed approach clearly outperforms the state of the art.
POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning
Futoma, Joseph, Hughes, Michael C., Doshi-Velez, Finale
Many medical decision-making settings can be framed as partially observed Markov decision processes (POMDPs). However, popular two-stage approaches that first learn a POMDP model and then solve it often fail because the model that best fits the data may not be the best model for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in the kinds of batch, off-policy settings common in medicine. We demonstrate our approach on synthetic examples and a real-world hypotension management task.
Unbiased and Efficient Log-Likelihood Estimation with Inverse Binomial Sampling
van Opheusden, Bas, Acerbi, Luigi, Ma, Wei Ji
The fate of scientific hypotheses often relies on the ability of a computational model to explain the data, quantified in modern statistical approaches by the likelihood function. The log-likelihood is the key element for parameter estimation and model evaluation. However, the log-likelihood of complex models in fields such as computational biology and neuroscience is often intractable to compute analytically or numerically. In those cases, researchers can often only estimate the log-likelihood by comparing observed data with synthetic observations generated by model simulations. Standard techniques to approximate the likelihood via simulation either use summary statistics of the data or are at risk of producing severe biases in the estimate. Here, we explore another method, inverse binomial sampling (IBS), which can estimate the log-likelihood of an entire data set efficiently and without bias. For each observation, IBS draws samples from the simulator model until one matches the observation. The log-likelihood estimate is then a function of the number of samples drawn. The variance of this estimator is uniformly bounded, achieves the minimum variance for an unbiased estimator, and we can compute calibrated estimates of the variance. We provide theoretical arguments in favor of IBS and an empirical assessment of the method for maximum-likelihood estimation with simulation-based models. As case studies, we take three model-fitting problems of increasing complexity from computational and cognitive neuroscience. In all problems, IBS generally produces lower error in the estimated parameters and maximum log-likelihood values than alternative sampling methods with the same average number of samples. Our results demonstrate the potential of IBS as a practical, robust, and easy to implement method for log-likelihood evaluation when exact techniques are not available.
Stepwise Model Selection for Sequence Prediction via Deep Kernel Learning
Zhang, Yao, Jarrett, Daniel, van der Schaar, Mihaela
An essential problem in automated machine learning (AutoML) is that of model selection. A unique challenge in the sequential setting is the fact that the optimal model itself may vary over time, depending on the distribution of features and labels available up to each point in time. In this paper, we propose a novel Bayesian optimization (BO) algorithm to tackle the challenge of model selection in this setting. This is accomplished by treating the performance at each time step as its own black-box function. In order to solve the resulting multiple black-box function optimization problem jointly and efficiently, we exploit potential correlations among black-box functions using deep kernel learning (DKL). To the best of our knowledge, we are the first to formulate the problem of stepwise model selection (SMS) for sequence prediction, and to design and demonstrate an efficient joint-learning algorithm for this purpose. Using multiple real-world datasets, we verify that our proposed method outperforms both standard BO and multi-objective BO algorithms on a variety of sequence prediction tasks.
On Computation and Generalization of Generative Adversarial Imitation Learning
Chen, Minshuo, Wang, Yizhou, Liu, Tianyi, Yang, Zhuoran, Li, Xingguo, Wang, Zhaoran, Zhao, Tuo
Generative Adversarial Imitation Learning (GAIL) is a powerful and practical approach for learning sequential decision-making policies. Different from Reinforcement Learning (RL), GAIL takes advantage of demonstration data by experts (e.g., human), and learns both the policy and reward function of the unknown environment. Despite the significant empirical progresses, the theory behind GAIL is still largely unknown. The major difficulty comes from the underlying temporal dependency of the demonstration data and the minimax computational formulation of GAIL without convex-concave structure. To bridge such a gap between theory and practice, this paper investigates the theoretical properties of GAIL. Specifically, we show: (1) For GAIL with general reward parameterization, the generalization can be guaranteed as long as the class of the reward functions is properly controlled; (2) For GAIL, where the reward is parameterized as a reproducing kernel function, GAIL can be efficiently solved by stochastic first order optimization algorithms, which attain sublinear convergence to a stationary solution. To the best of our knowledge, these are the first results on statistical and computational guarantees of imitation learning with reward/policy function approximation. Numerical experiments are provided to support our analysis.
Latent Factor Analysis of Gaussian Distributions under Graphical Constraints
Hasan, Md Mahmudul, Wei, Shuangqing, Moharrer, Ali
Latent Factor Analysis of Gaussian Distributions under Graphical Constraints Md Mahmudul Hasan, Shuangqing Wei, Ali Moharrer Abstract --We explore the algebraic structure of the solution space of convex optimization problem Constrained Minimum Trace Factor Analysis (CMTF A), when the population covariance matrix Σ x has an additional latent graphical constraint, namely, a latent star topology. In particular, we have shown that CMTF A can have either a rank 1 or a rank n 1 solution and nothing in between. We found explicit conditions for both rank 1 and rank n 1 solutions for CMTF A solution of Σ x. As a basic attempt towards building a more general Gaussian tree, we have found a necessary and a sufficient condition for multiple clusters, each having rank 1 CMTF A solution, to satisfy a minimum probability to combine together to build a Gaussian tree. T o support our analytical findings we have presented some numerical demonstrating the usefulness of the contributions of our work. Index T erms --Factor Analysis, MTF A, CMTF A, CMDF A I. INTRODUCTION A. Motivation Factor Analysis (FA) is a commonly used tool in multivariate statistics to represent the correlation structure of a set of observables in terms of significantly smaller number of variables called "latent factors". With the growing use in data mining, high dimensional data analytics, factor analysis has already become a prolific area of research [1] [2]. Classical factor analysis models seek to decompose the correlation matrix of an n -dimensional random vector X R n, Σ x, as the sum of a diagonal matrix D and a Gramian matrix Σ x D .