Evaluating Language Models for Mathematics through Interactions