f5ac21cd0ef1b88e9848571aeb53551a-Supplemental.pdf

Aug-18-2025, 22:02:25 GMT–Neural Information Processing Systems

Supplementary to "DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-T ask Learning" In MTL, deep learning-based architectures that perform soft-parameter sharing, i.e., share model parameters partially, are proving to be effective at exploiting both the commonalities and differences among tasks [ This approach is similar to static gating, but it does not support per-example gating. Moreover, the number of nonzeros cannot be directly controlled (in contrast to our gate). Next, we show Direction (II). From the definition of r ( .), the following holds: r (S (v)) The penalty described above is part of our TensorFlow implementation of DSelect-k. Note that the logistic function is re-scaled to be on the same scale as the smooth-step function.Figure B.1: The Smooth-step ( γ = 1) and Logistic functions.

dselect-k, jaccard index, moe, (15 more...)

Neural Information Processing Systems

Aug-18-2025, 22:02:25 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Duplicate Docs Excel Report

Title
Supplementaryto"DSelect-k: Differentiable SelectionintheMixtureofExpertswithApplications toMulti-TaskLearning "

Similar Docs Excel Report more

Title	Similarity	Source
None found