Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts Models

Open in new window