Rethink the Evaluation Protocol of Model Merging on Classification Task