Supplementary: Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning A Analyzing the model bias for selecting train-test splits

Neural Information Processing Systems 

These settings are used throughout our study. In Tab. 1 we show the measured FID scores between each For each dataset we show examples for an easy, medium and hard train-test split. Tab. 2 first illustrates the FID scores for all pairwise combinations However, the fact that FID scores are relatively close to another despite large semantic differences between datasets may indicate that FID based on our utilised FID estimator (Sec. This section provides additional results for the experiments presented in Sec. 4 in the main paper. To this end, we provide the exact performance values used to visualize Figure 1 in the main paper in Tab.