Compression-aware Training of Neural Networks using Frank-Wolfe