SURF: A Generalization Benchmark for GNNs Predicting Fluid Dynamics