GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models