A.1 ConjugateDerivations Cross-EntropyLoss: L(h,y) = cX

Neural Information Processing Systems 

Thelossesarecompared onthreedegreesofshift(easy,moderate and hard), which is controlled by the drifted distance of Gaussian clusters. Herewediscuss the architecture chosen and the implementation details. Note that the task loss / surrogate loss function is used to update the meta-loss mϕ during meta-learning. The number of transformer layers and the hidden layers in MLP are selected from{1,2}. Wecanseethatthetask loss barely affects the learnt meta loss.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found