Dissecting the impact of different loss functions with gradient surgery