Assessing the Alignment of FOL Closeness Metrics with Human Judgement