Learning Chess Blindfolded: Evaluating Language Models on State Tracking