Review for NeurIPS paper: X-CAL: Explicit Calibration for Survival Analysis

Neural Information Processing Systems 

Weaknesses: Any kind of predictive model, and especially deep neural networks, will tend to overfit to the training set, generally causing predictions on a separate test set to be too extreme (shrinkage, or calibration slope of less than 1). The authors' X-cal procedure ensures good calibration on the training set. But that could result in disappointing calibration when applied to the test set. It seems to me that one would want a procedure to maximize calibration on a validation set, not the training set. That would then lead to good calibration on the separate test set.