Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark