ground-truth expectation
SupplementaryMaterial
S2.2 Varianceofimportanceweights The importance-sampled estimate of the log-likelihood used to retrain the oracle (Equation 17) is unbiased, butmayhavehighvariance duetothevariance oftheimportance weights. LetLβ: X R Rdenote a pertinent loss function induced by the oracle parameters,β, (e.g., the squared errorLβ(x,y) = (Eβ[y |x] y)2). While the bound,L, on Lβ may be restrictive in general, for any givenapplication one may beable touse domain-specific knowledge toestimateL. CbAS naturally controls the importance weight variance. Design procedures that leverage a trust region can naturally bound thevariance oftheimportance weights.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)
Supplementary Material S1 Pseudocode Algorithm 1 gives pseudocode for autofocusing a broad class of model-based optimization (MBO)
"E-step" (Steps 1 and 2 in Algorithm 1) and a weighted maximum likelihood estimation (MLE) "M-step" (Step 3; see [ ( t 1) (t 1) One may use these in a number of different ways. The following observation is due to Chebyshev's inequality. One can use Proposition S2.1 to construct a confidence interval on, for example, the expected squared Note that 1) the bound in Proposition S2.1 is CbAS naturally controls the importance weight variance. Design procedures that leverage a trust region can naturally bound the variance of the importance weights. We used CbAS as follows.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)
Autofocused oracles for model-based design
Fannjiang, Clara, Listgarten, Jennifer
Data-driven design is making headway into a number of application areas, including protein, small-molecule, and materials engineering. The design goal is to construct an object with desired properties, such as a protein that binds to a therapeutic target, or a superconducting material with a higher critical temperature than previously observed. To that end, costly experimental measurements are being replaced with calls to high-capacity regression models trained on labeled data, which can be leveraged in an in silico search for design candidates. However, the design goal necessitates moving into regions of the design space beyond where such models were trained. Therefore, one can ask: should the regression model be altered as the design algorithm explores the design space, in the absence of new data? Herein, we answer this question in the affirmative. In particular, we (i) formalize the data-driven design problem as a non-zero-sum game, (ii) develop a principled strategy for retraining the regression model as the design algorithm proceeds---what we refer to as autofocusing, and (iii) demonstrate the promise of autofocusing empirically.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)