Model Similarity Mitigates Test Set Overuse