A Omitted details from Section 3.1
–Neural Information Processing Systems
A.1 Implementation of Figure 1 Figure 1 is generated from a simple three-task linear MTL problem that we constructed, using Equation (4). The software we used is Mathematica. A.2 Over-parametrization reduces the Pareto front to a singleton For general non-linear multi-task neural networks, we will show that increasing the width reduces the Pareto front to a singleton, where all tasks simultaneously achieve zero training loss. As a consequence, linear scalarization with any convex coefficients will be able to achieve this point. To illustrate this point, we follow the same setting as in Section 2, except changing the model to be a two-layer ReLU multi-task network with bias terms.
Neural Information Processing Systems
Mar-27-2025, 14:12:01 GMT
- Technology: