A.1 ConjugateDerivations Cross-EntropyLoss: L(h,y) = cX
–Neural Information Processing Systems
Thelossesarecompared onthreedegreesofshift(easy,moderate and hard), which is controlled by the drifted distance of Gaussian clusters. Herewediscuss the architecture chosen and the implementation details. Note that the task loss / surrogate loss function is used to update the meta-loss mϕ during meta-learning. The number of transformer layers and the hidden layers in MLP are selected from{1,2}. Wecanseethatthetask loss barely affects the learnt meta loss.
Neural Information Processing Systems
Feb-19-2026, 00:13:04 GMT
- Technology: