Biased Local SGD for Efficient Deep Learning on Heterogeneous Systems