Szummer, Martin
Balance Regularized Neural Network Models for Causal Effect Estimation
Farajtabar, Mehrdad, Lee, Andrew, Feng, Yuanjian, Gupta, Vishal, Dolan, Peter, Chandran, Harish, Szummer, Martin
Estimating individual and average treatment effects from observational data is an important problem in many domains such as healthcare and e-commerce. In this paper, we advocate balance regularization of multi-head neural network architectures. Our work is motivated by representation learning techniques to reduce differences between treated and untreated distributions that potentially arise due to confounding factors. We further regularize the model by encouraging it to predict control outcomes for individuals in the treatment group that are similar to control outcomes in the control group. We empirically study the bias-variance trade-off between different weightings of the regularizers, as well as between inductive and transductive inference.
Amortized learning of neural causal representations
Ke, Nan Rosemary, Wang, Jane. X., Mitrovic, Jovana, Szummer, Martin, Rezende, Danilo J.
Causal models can compactly and efficiently encode the data-generating process under all interventions and hence may generalize better under changes in distribution. These models are often represented as Bayesian networks and learning them scales poorly with the number of variables. Moreover, these approaches cannot leverage previously learned knowledge to help with learning new causal models. In order to tackle these challenges, we represent a novel algorithm called \textit{causal relational networks} (CRN) for learning causal models using neural networks. The CRN represent causal models using continuous representations and hence could scale much better with the number of variables. These models also take in previously learned information to facilitate learning of new causal models. Finally, we propose a decoding-based metric to evaluate causal models with continuous representations. We test our method on synthetic data achieving high accuracy and quick adaptation to previously unseen causal models.
Markov Random Walk Representations with Continuous Distributions
Yeang, Chen-Hsiang, Szummer, Martin
Representations based on random walks can exploit discrete data distributions for clustering and classification. We extend such representations from discrete to continuous distributions. Transition probabilities are now calculated using a diffusion equation with a diffusion coefficient that inversely depends on the data density. We relate this diffusion equation to a path integral and derive the corresponding path probability measure. The framework is useful for incorporating continuous data densities and prior knowledge.
Information Regularization with Partially Labeled Data
Szummer, Martin, Jaakkola, Tommi S.
Classification with partially labeled data requires using a large number of unlabeled examples (or an estimated marginal P (x)), to further constrain the conditional P (y x) beyond a few available labeled examples. We formulate a regularization approach to linking the marginal and the conditional in a general way. The regularization penalty measures the information that is implied about the labels over covering regions. No parametric assumptions are required and the approach remains tractable even for continuous marginal densities P (x). We develop algorithms for solving the regularization problem for finite covers, establish a limiting differential equation, and exemplify the behavior of the new regularization approach in simple cases.
Information Regularization with Partially Labeled Data
Szummer, Martin, Jaakkola, Tommi S.
Classification with partially labeled data requires using a large number of unlabeled examples (or an estimated marginal P (x)), to further constrain theconditional P (y x) beyond a few available labeled examples. We formulate a regularization approach to linking the marginal and the conditional in a general way. The regularization penalty measures the information that is implied about the labels over covering regions. No parametric assumptions are required and the approach remains tractable even for continuous marginal densities P (x). We develop algorithms for solving the regularization problem for finite covers, establish a limiting differential equation, and exemplify the behavior of the new regularization approachin simple cases.
Partially labeled classification with Markov random walks
Szummer, Martin, Jaakkola, Tommi
Partially labeled classification with Markov random walks
Szummer, Martin, Jaakkola, Tommi
Kernel Expansions with Unlabeled Examples
Szummer, Martin, Jaakkola, Tommi
Modern classification applications necessitate supplementing the few available labeled examples with unlabeled examples to improve classification performance. We present a new tractable algorithm for exploiting unlabeled examples in discriminative classification. This is achieved essentially by expanding the input vectors into longer feature vectors via both labeled and unlabeled examples. The resulting classification method can be interpreted as a discriminative kernel density estimate and is readily trained via the EM algorithm, which in this case is both discriminative and achieves the optimal solution. We provide, in addition, a purely discriminative formulation of the estimation problem by appealing to the maximum entropy framework. We demonstrate that the proposed approach requires very few labeled examples for high classification accuracy.
Kernel Expansions with Unlabeled Examples
Szummer, Martin, Jaakkola, Tommi
Modern classification applications necessitate supplementing the few available labeled examples with unlabeled examples to improve classification performance.We present a new tractable algorithm for exploiting unlabeled examples in discriminative classification. This is achieved essentially by expanding the input vectors into longer feature vectors via both labeled and unlabeled examples. The resulting classification method can be interpreted as a discriminative kernel density estimate and is readily trainedvia the EM algorithm, which in this case is both discriminative and achieves the optimal solution. We provide, in addition, a purely discriminative formulationof the estimation problem by appealing to the maximum entropy framework. We demonstrate that the proposed approach requiresvery few labeled examples for high classification accuracy.