Can Multi-modal (reasoning) LLMs detect document manipulation?