Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains