Forward Gradient-Based Frank-Wolfe Optimization for Memory Efficient Deep Neural Network Training

Open in new window