Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients