Speeding Up Distributed Gradient Descent by Utilizing Non-persistent Stragglers