A Evaluation Metrics A.1 Expected Calibration Error (ECE) Expected calibration error (ECE) [

Neural Information Processing Systems 

The difference between acc and conf can be intuitively seemed as the deviation of the outputs to the diagonal in Figure 1. The higher the accuracy of predictions is, the lower BS is. The details of these datasets are summarized in Table 5. Our data are public and do not contain personally identifiable information and offensive content. The learning rate is 0.01 and the maximum number of iteration is 50.