Randomized Block-Diagonal Preconditioning for Parallel Learning
Mendler-Dünner, Celestine, Lucchi, Aurelien
We study preconditioned gradient-based optimization methods where the preconditioning matrix has block-diagonal form. Such a structural constraint comes with the advantage that the update computation is block-separable and can be parallelized across multiple independent tasks. Our main contribution is to demonstrate that the convergence of these methods can significantly be improved by a randomization technique which corresponds to repartitioning coordinates across tasks during the optimization procedure. We provide a theoretical analysis that accurately characterizes the expected convergence gains of repartitioning and validate our findings empirically on various traditional machine learning tasks. From an implementation perspective, block-separable models are well suited for parallelization and, when shared memory is available, randomization can be implemented on top of existing methods very efficiently to improve convergence.
Jun-24-2020
- Country:
- North America > United States
- Virginia (0.04)
- California > Alameda County
- Berkeley (0.04)
- Europe > Switzerland
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.66)
- Technology: