How to organize and track your PyTorch training by creating a run manager
Next, we are going to cover several stages in one step, all are related to each other and it will make more sense. Previously we have seen the methods for signaling the beginning of a run and of an epoch. In every epoch, we have to iterate over the entire dataset, for various reasons related to both computational and convergence optimizations we do so in batches (mini-batch stochastic gradient descent). As such we have to log some information related to this step to our RunManager, which will help us to compute the overall loss and the accuracy for the entire epoch. Once the epoch finishes we have to compute the aforementioned metrics, save them to the local collection and log them to Tensorboard.
Oct-6-2022, 04:37:51 GMT
- Technology: