Inference at the data's edge: Gaussian processes for modeling and inference under model-dependency, poor overlap, and extrapolation
Cho, Soonhong, Kim, Doeun, Hazlett, Chad
The Gaussian Process (GP) is a highly flexible non-linear regression approach that provides a principled approach to handling our uncertainty over predicted (counterfactual) values. It does so by computing a posterior distribution over predicted point as a function of a chosen model space and the observed data, in contrast to conventional approaches that effectively compute uncertainty estimates conditionally on placing full faith in a fitted model. This is especially valuable under conditions of extrapolation or weak overlap, where model dependency poses a severe threat. We first offer an accessible explanation of GPs, and provide an implementation suitable to social science inference problems. In doing so we reduce the number of user-chosen hyperparameters from three to zero. We then illustrate the settings in which GPs can be most valuable: those where conventional approaches have poor properties due to model-dependency/extrapolation in data-sparse regions. Specifically, we apply it to (i) comparisons in which treated and control groups have poor covariate overlap; (ii) interrupted time-series designs, where models are fitted prior to an event by extrapolated after it; and (iii) regression discontinuity, which depends on model estimates taken at or just beyond the edge of their supporting data.
Jul-15-2024
- Country:
- North America > United States
- District of Columbia (0.04)
- Vermont (0.04)
- North America > United States
- Genre:
- Research Report > Experimental Study (0.68)
- Industry:
- Government > Voting & Elections (0.67)
- Health & Medicine (0.68)
- Technology: