Small Area Estimation with Random Forests and the LASSO

Michal, Victoire, Wakefield, Jon, Schmidt, Alexandra M., Cavanaugh, Alicia, Robinson, Brian, Baumgartner, Jill

arXiv.org Machine Learning 

We consider random forests and LASSO methods for model-based small area estimation when the number of areas with sampled data is a small fraction of the total areas for which estimates are required. Abundant auxiliary information is available for the sampled areas, from the survey, and for all areas, from an exterior source, and the goal is to use auxiliary variables to predict the outcome of interest. We compare areallevel random forests and LASSO approaches to a frequentist forward variable selection approach and a Bayesian shrinkage method. This work is motivated by Ghanaian data available from the sixth Living Standard Survey (GLSS) and the 2010 Population and Housing Census. We estimate the areal mean household log consumption using both datasets. The outcome variable is measured only in the GLSS for 3% of all the areas (136 out of 5019) and more than 170 potential covariates are available from both datasets. Among the four modelling methods considered, the Bayesian shrinkage performed the best in terms of bias, MSE and prediction interval coverages and scores, as assessed through a cross-validation study. We find substantial between-area variation, the log consumption areal point estimates showing a 1.3-fold variation across the GAMA region. The western areas are the poorest while the Accra Metropolitan Area district gathers the richest areas. In 2015, the United Nations (UN) released their 2030 agenda for sustainable development goals (SDGs) consisting of 17 goals, the first of which was to end poverty worldwide (Resolution, General Assembly and others, 2015). For their first SDG, the UN made seven guidelines explicit, including the implementation of "poverty eradication policies" at a disaggregated level. To that end, producing reliable and fine-grained pictures of socioeconomic status and income inequality is fundamental to help decision makers prioritise and target certain areas. These detailed maps help local communities understand their situation compared to their neighbours, which also helps when planning interventions (Bedi et al., 2007). In Ghana, household surveys are collected every few years to measure the living conditions of households across Ghanaian regions and districts and to monitor poverty.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found