AITopics | gui

Collaborating Authors

gui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Pairwise GUI Dataset Construction Between Android Phones and Tablets

Neural Information Processing SystemsFeb-16-2026, 19:41:55 GMT

GUIs development to enhance developers' productivity.

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > Dominican Republic (0.04)
Asia > Taiwan > Takao Province > Kaohsiung (0.04)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback

Supplementary Information ACollecting Internet Data

Neural Information Processing SystemsFeb-11-2026, 00:28:17 GMT

The tremendous, natur -language-conditioned variety perform compositional 290).

artificial intelligence, machine learning, minecraft, (15 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training

Xu, Donglai, Yang, Hongzheng, Zhao, Yuzhi, Zhang, Pingping, Chen, Jinpeng, Ma, Wenao, Hou, Zhijian, Wu, Mengyang, Li, Xiaolei, Hu, Senkang, Guan, Ziyi, Li, Jason Chun Lok, Po, Lai Man

arXiv.org Artificial IntelligenceNov-12-2025

Reinforcement Learning with Verifiable Rewards (RLVR) for Multimodal Large Language Models (MLLMs) is highly dependent on high-quality labeled data, which is often scarce and prone to substantial annotation noise in real-world scenarios. Existing unsupervised RLVR methods, including pure entropy minimization, can overfit to incorrect labels and limit the crucial reward ranking signal for Group-Relative Policy Optimization (GRPO). To address these challenges and enhance noise tolerance, we propose a novel two-stage, token-level entropy optimization method for RLVR. This approach dynamically guides the model from exploration to exploitation during training. In the initial exploration phase, token-level entropy maximization promotes diverse and stochastic output generation, serving as a strong regularizer that prevents premature convergence to noisy labels and ensures sufficient intra-group variation, which enables more reliable reward gradient estimation in GRPO. As training progresses, the method transitions into the exploitation phase, where token-level entropy minimization encourages the model to produce confident and deterministic outputs, thereby consolidating acquired knowledge and refining prediction accuracy. Empirically, across three MLLM backbones - Qwen2-VL-2B, Qwen2-VL-7B, and Qwen2.5-VL-3B - spanning diverse noise settings and multiple tasks, our phased strategy consistently outperforms prior approaches by unifying and enhancing external, internal, and entropy-based methods, delivering robust and superior performance across the board.

grpow, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.07738

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Xie, Tianbao, Deng, Jiaqi, Li, Xiaochuan, Yang, Junlin, Wu, Haoyuan, Chen, Jixuan, Hu, Wenjing, Wang, Xinyuan, Xu, Yuhui, Wang, Zekun, Xu, Yiheng, Wang, Junli, Sahoo, Doyen, Yu, Tao, Xiong, Caiming

arXiv.org Artificial IntelligenceOct-28-2025

Graphical user interface (GUI) grounding, the ability to map natural language instructions to specific actions on graphical user interfaces, remains a critical bottleneck in computer use agent development. Current benchmarks oversimplify grounding tasks as short referring expressions, failing to capture the complexity of real-world interactions that require software commonsense, layout understanding, and fine-grained manipulation capabilities. To address these limitations, we introduce OSWorld-G, a comprehensive benchmark comprising 564 finely annotated samples across diverse task types including text matching, element recognition, layout understanding, and precise manipulation. Additionally, we synthesize and release the largest computer use grounding dataset Jedi, which contains 4 million examples through multi-perspective decoupling of tasks. Our multi-scale models trained on Jedi demonstrate its effectiveness by outperforming existing approaches on ScreenSpot-v2, ScreenSpot-Pro, and our OSWorld-G. Furthermore, we demonstrate that improved grounding with Jedi directly enhances agentic capabilities of general foundation models on complex computer tasks, improving from 5% to 27% on OSWorld. Through detailed ablation studies, we identify key factors contributing to grounding performance and verify that combining specialized data for different interface elements enables compositional generalization to novel interfaces. All benchmark, data, checkpoints, and code are open-sourced and available at https://osworld-grounding.github.io.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.13227

Genre:

Research Report (1.00)
Workflow (0.93)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Information Technology > Software (1.00)
(7 more...)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

Development of an Intuitive GUI for Non-Expert Teleoperation of Humanoid Robots

Barret, Austin, Lau, Meng Cheng

arXiv.org Artificial IntelligenceOct-16-2025

The operation of humanoid robotics is an essential field of research with many practical and competitive applications. Many of these systems, however, do not invest heavily in developing a non-expert-centered graphical user interface (GUI) for operation. The focus of this research is to develop a scalable GUI that is tailored to be simple and intuitive so non-expert operators can control the robot through a FIRA-regulated obstacle course. Using common practices from user interface development (UI) and understanding concepts described in human-robot interaction (HRI) and other related concepts, we will develop a new interface with the goal of a non-expert teleoperation system.

artificial intelligence, human computer interaction, robot, (18 more...)

arXiv.org Artificial Intelligence

2510.13594

Genre: Research Report (0.40)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.71)

Add feedback

Pairwise GUI Dataset Construction Between Android Phones and Tablets

Neural Information Processing SystemsOct-9-2025, 06:04:36 GMT

GUIs development to enhance developers' productivity.

data mining, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > Dominican Republic (0.04)
Asia > Taiwan > Takao Province > Kaohsiung (0.04)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback

SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing

Jing, Hongyi, Chen, Jiafu, Rao, Chen, Dang, Ziqiang, Teng, Jiajie, Chu, Tianyi, Mo, Juncheng, Fang, Shuo, Lin, Huaizhong, Lv, Rui, Ma, Chenguang, Zhao, Lei

arXiv.org Artificial IntelligenceSep-8-2025

The existing Multimodal Large Language Models (MLLMs) for GUI perception have made great progress. However, the following challenges still exist in prior methods: 1) They model discrete coordinates based on text autoregressive mechanism, which results in lower grounding accuracy and slower inference speed. 2) They can only locate predefined sets of elements and are not capable of parsing the entire interface, which hampers the broad application and support for downstream tasks. To address the above issues, we propose SparkUI-Parser, a novel end-to-end framework where higher localization precision and fine-grained parsing capability of the entire interface are simultaneously achieved. Specifically, instead of using probability-based discrete modeling, we perform continuous modeling of coordinates based on a pre-trained Multimodal Large Language Model (MLLM) with an additional token router and coordinate decoder. This effectively mitigates the limitations inherent in the discrete output characteristics and the token-by-token generation process of MLLMs, consequently boosting both the accuracy and the inference speed. To further enhance robustness, a rejection mechanism based on a modified Hungarian matching algorithm is introduced, which empowers the model to identify and reject non-existent elements, thereby reducing false positives. Moreover, we present ScreenParse, a rigorously constructed benchmark to systematically assess structural perception capabilities of GUI models across diverse scenarios. Extensive experiments demonstrate that our approach consistently outperforms SOTA methods on ScreenSpot, ScreenSpot-v2, CAGUI-Grounding and ScreenParse benchmarks. The resources are available at https://github.com/antgroup/SparkUI-Parser.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.04908

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Mechanical Automation with Vision: A Design for Rubik's Cube Solver

Chalise, Abhinav, Pradhan, Nimesh Gopal, Khanal, Nishan, Bista, Prashant Raj, Kshatri, Dinesh Baniya

arXiv.org Artificial IntelligenceAug-19-2025

The core mechanical system is built around three stepper motors for physical manipulation, a microcontroller for hardware control, a camera and YOLO detection model for real-time cube state detection. A significant software component is the development of a user-friendly graphical user interface (GUI) designed in Unity. The initial state after detection from real-time YOLOv8 model (Precision 0.98443, Recall 0.98419, Box Loss 0.42051, Class Loss 0.2611) is virtualized on GUI. To get the solution, the system employs the Kociemba's algorithm while physical manipulation with a single degree of freedom is done by combination of stepper motors' interaction with the cube achieving the average solving time of ~2.2 minutes.

artificial intelligence, machine learning, rubik, (15 more...)

arXiv.org Artificial Intelligence

2508.12469

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Rubik's Cube (0.48)

Technology: