DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference

Open in new window