InDefenseoftheUnitaryScalarization forDeepMulti-TaskLearning

Neural Information Processing Systems 

While some workshowsthatmulti-task networkstrained viaunitary scalarization exhibit superior performance to independent per-task models [29, 35], others suggest the opposite [30, 54, 58]. However, SMTOs usually require access to per-task gradients either with respect to the shared parameters, or to the shared representation.