Singh, Rahul
Inference of collective Gaussian hidden Markov models
Singh, Rahul, Chen, Yongxin
We consider inference problems for a class of continuous state collective hidden Markov models, where the data is recorded in aggregate (collective) form generated by a large population of individuals following the same dynamics. We propose an aggregate inference algorithm called collective Gaussian forward-backward algorithm, extending recently proposed Sinkhorn belief propagation algorithm to models characterized by Gaussian densities. Our algorithm enjoys convergence guarantee. In addition, it reduces to the standard Kalman filter when the observations are generated by a single individual. The efficacy of the proposed algorithm is demonstrated through multiple experiments.
Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy
Agarwal, Anish, Singh, Rahul
Even the most carefully curated economic data sets have variables that are noisy, missing, discretized, or privatized. The standard workflow for empirical research involves data cleaning followed by data analysis that typically ignores the bias and variance consequences of data cleaning. We formulate a semiparametric model for causal inference with corrupted data to encompass both data cleaning and data analysis. We propose a new end-to-end procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove root-n consistency, Gaussian approximation, and semiparametric efficiency for our estimator of the causal parameter by finite sample arguments. Our key assumption is that the true covariates are approximately low rank. In our analysis, we provide nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. We verify the coverage of the data cleaning-adjusted confidence intervals in simulations.
A Simple and General Debiased Machine Learning Theorem with Finite Sample Guarantees
Chernozhukov, Victor, Newey, Whitney K., Singh, Rahul
Debiased machine learning is a meta algorithm based on bias correction and sample splitting to calculate confidence intervals for functionals (i.e. scalar summaries) of machine learning algorithms. For example, an analyst may desire the confidence interval for a treatment effect estimated with a neural network. We provide a nonasymptotic debiased machine learning theorem that encompasses any global or local functional of any machine learning algorithm that satisfies a few simple, interpretable conditions. Formally, we prove consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. The rate of convergence is root-n for global functionals, and it degrades gracefully for local functionals. Our results culminate in a simple set of conditions that an analyst can use to translate modern learning theory rates into traditional statistical inference. The conditions reveal a new double robustness property for ill posed inverse problems.
Robustness Tests of NLP Machine Learning Models: Search and Semantically Replace
Singh, Rahul, Jindal, Karan, Yu, Yufei, Yang, Hanyu, Joshi, Tarun, Campbell, Matthew A., Shoumaker, Wayne B.
This paper proposes a strategy to assess the robustness of different machine learning models that involve natural language processing (NLP). The overall approach relies upon a Search and Semantically Replace strategy that consists of two steps: (1) Search, which identifies important parts in the text; (2) Semantically Replace, which finds replacements for the important parts, and constrains the replaced tokens with semantically similar words. We introduce different types of Search and Semantically Replace methods designed specifically for particular types of machine learning models. We also investigate the effectiveness of this strategy and provide a general framework to assess a variety of machine learning models. Finally, an empirical comparison is provided of robustness performance among three different model types, each with a different text representation.
Debiased Kernel Methods
Singh, Rahul
I propose a practical procedure based on bias correction and sample splitting to calculate confidence intervals for functionals of generic kernel methods, i.e. nonparametric estimators learned in a reproducing kernel Hilbert space (RKHS). For example, an analyst may desire confidence intervals for functionals of kernel ridge regression. I propose a bias correction that mirrors kernel ridge regression. The framework encompasses (i) evaluations over discrete domains, (ii) derivatives over continuous domains, (iii) treatment effects of discrete treatments, and (iv) incremental treatment effects of continuous treatments. For the target quantity, whether it is (i)-(iv), I prove root-n consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. I show that the classic assumptions of RKHS learning theory also imply inference.
Straggler-Resilient Distributed Machine Learning with Dynamic Backup Workers
Xiong, Guojun, Yan, Gang, Singh, Rahul, Li, Jian
With the increasing demand for large-scale training of machine learning models, consensus-based distributed optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each worker maintains a local estimate of the optimal parameter vector, and iteratively updates it by waiting and averaging all estimates obtained from its neighbors, and then corrects it on the basis of its local dataset. However, the synchronization phase can be time consuming due to the need to wait for \textit{stragglers}, i.e., slower workers. An efficient way to mitigate this effect is to let each worker wait only for updates from the fastest neighbors before updating its local parameter. The remaining neighbors are called \textit{backup workers.} To minimize the globally training time over the network, we propose a fully distributed algorithm to dynamically determine the number of backup workers for each worker. We show that our algorithm achieves a linear speedup for convergence (i.e., convergence performance increases linearly with respect to the number of workers). We conduct extensive experiments on MNIST and CIFAR-10 to verify our theoretical results.
Adversarial Estimation of Riesz Representers
Chernozhukov, Victor, Newey, Whitney, Singh, Rahul, Syrgkanis, Vasilis
We provide an adversarial approach to estimating Riesz representers of linear functionals within arbitrary function spaces. We prove oracle inequalities based on the localized Rademacher complexity of the function space used to approximate the Riesz representer and the approximation error. These inequalities imply fast finite sample mean-squared-error rates for many function spaces of interest, such as high-dimensional sparse linear functions, neural networks and reproducing kernel Hilbert spaces. Our approach offers a new way of estimating Riesz representers with a plethora of recently introduced machine learning techniques. We show how our estimator can be used in the context of de-biasing structural/causal parameters in semi-parametric models, for automated orthogonalization of moment equations and for estimating the stochastic discount factor in the context of asset pricing.
Kernel Methods for Unobserved Confounding: Negative Controls, Proxies, and Instruments
Singh, Rahul
Negative control is a strategy for learning the causal relationship between treatment and outcome in the presence of unmeasured confounding. The treatment effect can nonetheless be identified if two auxiliary variables are available: a negative control treatment (which has no effect on the actual outcome), and a negative control outcome (which is not affected by the actual treatment). These auxiliary variables can also be viewed as proxies for a traditional set of control variables, and they bear resemblance to instrumental variables. I propose a new family of non-parametric algorithms for learning treatment effects with negative controls. I consider treatment effects of the population, of sub-populations, and of alternative populations. I allow for data that may be discrete or continuous, and low-, high-, or infinite-dimensional. I impose the additional structure of the reproducing kernel Hilbert space (RKHS), a popular non-parametric setting in machine learning. I prove uniform consistency and provide finite sample rates of convergence. I evaluate the estimators in simulations.
Unwrapping The Black Box of Deep ReLU Networks: Interpretability, Diagnostics, and Simplification
Sudjianto, Agus, Knauth, William, Singh, Rahul, Yang, Zebin, Zhang, Aijun
The deep neural networks (DNNs) have achieved great success in learning complex patterns with strong predictive power, but they are often thought of as "black box" models without a sufficient level of transparency and interpretability. It is important to demystify the DNNs with rigorous mathematics and practical tools, especially when they are used for mission-critical applications. This paper aims to unwrap the black box of deep ReLU networks through local linear representation, which utilizes the activation pattern and disentangles the complex network into an equivalent set of local linear models (LLMs). We develop a convenient LLM-based toolkit for interpretability, diagnostics, and simplification of a pre-trained deep ReLU network. We propose the local linear profile plot and other visualization methods for interpretation and diagnostics, and an effective merging strategy for network simplification. The proposed methods are demonstrated by simulation examples, benchmark datasets, and a real case study in home lending credit risk assessment.
Filtering for Aggregate Hidden Markov Models with Continuous Observations
Zhang, Qinsheng, Singh, Rahul, Chen, Yongxin
We consider a class of filtering problems for large populations where each individual is modeled by the same hidden Markov model (HMM). In this paper, we focus on aggregate inference problems in HMMs with discrete state space and continuous observation space. The continuous observations are aggregated in a way such that the individuals are indistinguishable from measurements. We propose an aggregate inference algorithm called continuous observation collective forward-backward algorithm. It extends the recently proposed collective forward-backward algorithm for aggregate inference in HMMs with discrete observations to the case of continuous observations. The efficacy of this algorithm is illustrated through several numerical experiments.