Factual and Musical Evaluation Metrics for Music Language Models