Can Multi-modal (reasoning) LLMs work as deepfake detectors?