Reviews: The Everlasting Database: Statistical Validity at a Fair Price

Neural Information Processing Systems 

This paper studies the problem of answering a (possibly infinite) sequence of (adaptive and non-adaptive) statistical queries without overfitting. Queries trigger the acquisition of fresh data when the mechanism determines that overfitting is likely, so adaptive queries necessitate new data. By continually acquiring fresh data as needed, the mechanism can (whp) guarantee accuracy in perpetuity. Moreover, by passing on the "cost" of data acquisition to queries that trigger it, the mechanism guarantees that (whp) non-adaptive queries pay cost O(log(# queries)) while adaptive queries pay cost O(sqrt(# queries)). Suggested applications are the normal ones for adaptive data analysis: ML competition leaderboards and scientific discovery.