On Scalable and Efficient Computation of Large Scale Optimal Transport