f5ac21cd0ef1b88e9848571aeb53551a-Supplemental.pdf
–Neural Information Processing Systems
Supplementary to "DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-T ask Learning" In MTL, deep learning-based architectures that perform soft-parameter sharing, i.e., share model parameters partially, are proving to be effective at exploiting both the commonalities and differences among tasks [ This approach is similar to static gating, but it does not support per-example gating. Moreover, the number of nonzeros cannot be directly controlled (in contrast to our gate). Next, we show Direction (II). From the definition of r ( .), the following holds: r (S (v)) The penalty described above is part of our TensorFlow implementation of DSelect-k. Note that the logistic function is re-scaled to be on the same scale as the smooth-step function.Figure B.1: The Smooth-step ( γ = 1) and Logistic functions.
Neural Information Processing Systems
Aug-18-2025, 22:02:25 GMT
- Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Technology: