Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
–Neural Information Processing Systems
The vast majority of deep models use multiple gradient signals, typically corresponding to a sum of multiple loss terms, to update a shared set of trainable weights.
Neural Information Processing Systems
Oct-2-2025, 05:22:20 GMT