Unlearning as Ablation: Toward a Falsifiable Benchmark for Generative Scientific Discovery