Optimal Gradient Checkpointing for Sparse and Recurrent Architectures using Off-Chip Memory

Open in new window