See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles