Introducing Multiple ModelCheckpoint Callbacks
When training a model, there is always a chance that something might fail unexpectedly. Proper checkpointing provides a safety net during failures that enables users to restore the state of the model and trainer from a checkpoint file. In Lightning, checkpointing is a core feature in the Trainer and is turned on by default to create a checkpoint after each epoch. But checkpointing provides more than just a safety net in case of failure. Often we care about keeping track of the "best" model weights encountered during the course of training, because in practice not every new epoch leads to an improved generalization error (unstable optimization, overfitting).
Dec-2-2021, 12:48:17 GMT
- Technology: