A Multimodal Approach to Device-Directed Speech Detection with Large Language Models