Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization