Adaptive Proximal Gradient Methods for Structured Neural Networks

Neural Information Processing Systems 

Lastly, we demonstrate the superiority of stochastic proximal methods compared to subgradient-based approaches via extensive experiments.