Reviews: Model Similarity Mitigates Test Set Overuse