References

Neural Information Processing Systems 

Sparse mixture-of-experts are domain generalizable learners.