Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models