Review for NeurIPS paper: Gradient Surgery for Multi-Task Learning

Neural Information Processing Systems 

Though the toy example in Figure 1 is quite intuitive and proof of convergence under convex case is provided, the paper misses an important part: how severe is the conflicting gradient problem is in practical tasks? I suggest the authors to draw some plots of the cosine similarity between the gradients from different losses. I also went through the appendix, but failed to find such plots. This is my main concern. Though I understand it's a workaround for practical use, it makes the generalization of Theorem 1 and 2 to the case n 2 (the number of task loss functions) non-trivial.