Supplementaryto"DSelect-k: Differentiable SelectionintheMixtureofExpertswithApplications toMulti-TaskLearning "
–Neural Information Processing Systems
MTL: InMTL, deep learning-based architectures that perform soft-parameter sharing, i.e., share model parameters partially, are proving to be effective at exploiting both the commonalities and differences among tasks [6]. Ourwork is also related to [5] who introduced "routers" (similar to gates) that can choose which layers or components of layers to activate per-task. The routers in the latter work are not differentiable and requirereinforcementlearning. To construct α, there are two cases to consider: (i)s = k and (ii) s < k. If s = k, then set αi = log(w ti) for i [k]. Our base case is fort = 1.
Neural Information Processing Systems
Feb-11-2026, 22:47:04 GMT
- Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Technology: