Pre-Attention Expert Prediction and Prefetching for Mixture-of-Experts Large Language Models

Open in new window