Structuring GUI Elements through Vision Language Models: Towards Action Space Generation

Open in new window