To See or To Read: User Behavior Reasoning in Multimodal LLMs