Supplementary: Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning A Analyzing the model bias for selecting train-test splits