WinClick: GUI Grounding with Multimodal Large Language Models