Can Multi-modal (reasoning) LLMs detect document manipulation?

Open in new window