Review for NeurIPS paper: GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification

Neural Information Processing Systems 

Summary and Contributions: Trust Region Newton Algorithm (TRON) is the most efficient solver for L2 regularized primal problems e.g. Due to the complex and sequential nature of this algo., its past performance boosts have largely been driven by shared memory multi-core systems. This paper demonstrates significant speedups in the training time of TRON solver compared to multithreaded implementations by using GPU specific optimization principles. The authors apply specific optimizations on sparse representation (LR training) and dense representation problems (SVM training) to generate significant speedups in their training time using GPUs. Specifically, for sparse feature representation datasets and LR loss function, the authors prescribe optimizations that minimize sequential dependence of CPU/GPU execution on each other by assuming all conditional branches evaluate in favor of the high-compute operations that can be run pre-emptively on the GPU.