Is Elo Rating Reliable? A Study Under Model Misspecification