A Benchmark for the Detection of Metalinguistic Disagreements between LLMs and Knowledge Graphs