Mixture of Many Zero-Compute Experts: A High-Rate Quantization Theory Perspective
–arXiv.org Artificial Intelligence
This paper uses classical high-rate quantization theory to provide new insights into mixture-of-experts (MoE) models for regression tasks. Our MoE is defined by a segmentation of the input space to regions, each with a single-parameter expert that acts as a constant predictor with zero-compute at inference. Motivated by high-rate quantization theory a ssumptions, we assume that the number of experts is sufficiently large to make their input-space re gions very small. This lets us to study the approximation error of our MoE model class: (i) for one-dime nsional inputs, we formulate the test error and its minimizing segmentation and experts; (ii) for multidimensional inputs, we formulate an upper bound for the test error and study its minimization. Moreover, we consider the learning of the expert parameters from a training dataset, given an in put-space segmentation, and formulate their statistical learning properties. This leads us to the oretically and empirically show how the tradeoff between approximation and estimation errors in Mo E learning depends on the number of experts.
arXiv.org Artificial Intelligence
Oct-6-2025
- Country:
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Genre:
- Research Report (0.40)
- Industry:
- Education > Educational Setting > Online (0.34)
- Technology: