Goto

Collaborating Authors

 pcgrad






InDefenseoftheUnitaryScalarization forDeepMulti-TaskLearning

Neural Information Processing Systems

While some workshowsthatmulti-task networkstrained viaunitary scalarization exhibit superior performance to independent per-task models [29, 35], others suggest the opposite [30, 54, 58]. However, SMTOs usually require access to per-task gradients either with respect to the shared parameters, or to the shared representation.



GradientSurgeryforMulti-TaskLearning

Neural Information Processing Systems

The optimization landscape of each task consists of a deep valley, a property that has been observed in neural network optimization landscapes [22], and the bottom ofeachvalleyischaracterized by high positive curvature and large differences in the task gradient magnitudes.