An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party Dialogue