How to validate average calibration for machine learning regression tasks ?