f5ac21cd0ef1b88e9848571aeb53551a-Supplemental.pdf

Neural Information Processing Systems 

Supplementary to "DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-T ask Learning" In MTL, deep learning-based architectures that perform soft-parameter sharing, i.e., share model parameters partially, are proving to be effective at exploiting both the commonalities and differences among tasks [ This approach is similar to static gating, but it does not support per-example gating. Moreover, the number of nonzeros cannot be directly controlled (in contrast to our gate). Next, we show Direction (II). From the definition of r ( .), the following holds: r (S (v)) The penalty described above is part of our TensorFlow implementation of DSelect-k. Note that the logistic function is re-scaled to be on the same scale as the smooth-step function.Figure B.1: The Smooth-step ( γ = 1) and Logistic functions.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found