In search of robust measures of generalization