Keeping greed good: sparse regression under design uncertainty with application to biomass characterization
Biagioni, David J., Elmore, Ryan, Jones, Wesley
This paper is motivated by the practical problem of how to meaningfully perform sparse regression when the predictor variables are observed with measurement error or some source of uncertainty. We will refer to this error or noise as design uncertainty to emphasize that perturbations in the design matrix may arise from a number of random sources unrelated to experimental or measurement error per se. Recent workin this areahasjust begun to addressthe issue ofsparseregressionunder design uncertainty from a theoretical point of view. We are primarily interested in describing an approach that, while theoretically justifiable, is essentially pragmatic and broadly applicable. In short, we argue that greed - a basic feature of many sparsity promoting algorithms - is indeed good [Tropp, 2004], so long as the design data is scaled by the uncertainty variances. We demonstrate the efficacy of scaling from several points of view and validate it empirically with a biomass characterization data set using two of the most widely used sparse algorithms: least angle regression (LARS) and the Dantzig selector (DS). Our work was motivated by an example from a biomass characterization experiment related to work at the National Renewable Energy Laboratory. The example is described in detail in Section 4 and contains repeated measurements of mass spectral (design, or predictor) and sugar mass fraction (response) values within each switchgrass sample. The domain scientists' goal was to find a small subset of masses in the spectrum that could be used to predict sugar mass fraction.
Jul-8-2012
- Country:
- North America > United States > Colorado (0.28)
- Genre:
- Research Report (0.64)
- Industry:
- Energy > Renewable (0.68)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Technology: