Toward Stable and Consistent Evaluation Results: A New Methodology for Base Model Evaluation