Evaluating Multimodal Large Language Models with Daily Composite Tasks in Home Environments