AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)
Information Technology > Communications > Mobile (0.64)

Neural Information Processing SystemsOct-9-2025, 06:04:40 GMT

bc4cff0b37ccab13e98b6128d89ca172-Supplemental-Datasets_and_Benchmarks.pdf

artificial intelligence, information management, machine learning, (20 more...)

Country: Asia > Taiwan > Takao Province > Kaohsiung (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Graphics (0.69)
Information Technology > Human Computer Interaction > Interfaces (0.69)
Information Technology > Communications (0.69)
(2 more...)

Neural Information Processing SystemsOct-9-2025, 06:04:36 GMT

Pairwise GUI Dataset Construction Between Android Phones and Tablets

GUIs development to enhance developers' productivity.

data mining, machine learning, natural language, (21 more...)

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > Dominican Republic (0.04)
Asia > Taiwan > Takao Province > Kaohsiung (0.04)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Kasibatla, Saketh Ram, Hiremath, Kiran Medleri, Rothkopf, Raven, Lerner, Sorin, Xia, Haijun, Hempel, Brian

The Command Line GUIde: Graphical Interfaces from Man Pages via AI

arXiv.org Artificial IntelligenceOct-3-2025

Although birthed in the era of teletypes, the command line shell survived the graphical interface revolution of the 1980's and lives on in modern desktop operating systems. The command line provides access to powerful functionality not otherwise exposed on the computer, but requires users to recall textual syntax and carefully scour documentation. In contrast, graphical interfaces let users organically discover and invoke possible actions through widgets and menus. To better expose the power of the command line, we demonstrate a mechanism for automatically creating graphical interfaces for command line tools by translating their documentation (in the form of man pages) into interface specifications via AI. Using these specifications, our user-facing system, called GUIde, presents the command options to the user graphically. We evaluate the generated interfaces on a corpus of commands to show to what degree GUIde offers thorough graphical interfaces for users' real-world command line tasks.

artificial intelligence, large language model, natural language, (19 more...)

2510.01453

Country: North America > United States > California (0.68)

Genre: Research Report (0.40)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

arXiv.org Artificial IntelligenceMay-21-2025

ViMo: A Generative Visual GUI World Model for App Agents

Luo, Dezhao, Tang, Bohan, Li, Kang, Papoudakis, Georgios, Song, Jifei, Gong, Shaogang, Hao, Jianye, Wang, Jun, Shao, Kun

App agents, which autonomously operate mobile Apps through Graphical User Interfaces (GUIs), have gained significant interest in real-world applications. Yet, they often struggle with long-horizon planning, failing to find the optimal actions for complex tasks with longer steps. To address this, world models are used to predict the next GUI observation based on user actions, enabling more effective agent planning. However, existing world models primarily focus on generating only textual descriptions, lacking essential visual details. To fill this gap, we propose ViMo, the first visual world model designed to generate future App observations as images. For the challenge of generating text in image patches, where even minor pixel errors can distort readability, we decompose GUI generation into graphic and text content generation. We propose a novel data representation, the Symbolic Text Representation~(STR) to overlay text content with symbolic placeholders while preserving graphics. With this design, ViMo employs a STR Predictor to predict future GUIs' graphics and a GUI-text Predictor for generating the corresponding text. Moreover, we deploy ViMo to enhance agent-focused tasks by predicting the outcome of different action options. Experiments show ViMo's ability to generate visually plausible and functionally effective GUIs that enable App agents to make more informed decisions.

large language model, machine learning, natural language, (19 more...)

2504.13936

Country: Europe (0.28)

Genre:

Research Report (0.82)
Workflow (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Garg, Aryan, Jiang, Yue, Oulasvirta, Antti

Controllable GUI Exploration

arXiv.org Artificial IntelligenceFeb-5-2025

During the early stages of interface design, designers need to produce multiple sketches to explore a design space. Design tools often fail to support this critical stage, because they insist on specifying more details than necessary. Although recent advances in generative AI have raised hopes of solving this issue, in practice they fail because expressing loose ideas in a prompt is impractical. In this paper, we propose a diffusion-based approach to the low-effort generation of interface sketches. It breaks new ground by allowing flexible control of the generation process via three types of inputs: A) prompts, B) wireframes, and C) visual flows. The designer can provide any combination of these as input at any level of detail, and will get a diverse gallery of low-fidelity solutions in response. The unique benefit is that large design spaces can be explored rapidly with very little effort in input-specification. We present qualitative results for various combinations of input specifications. Additionally, we demonstrate that our model aligns more accurately with these specifications than other models.

artificial intelligence, machine learning, natural language, (21 more...)

2502.0333

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Finland (0.05)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)
(8 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Graphics (0.97)
(2 more...)

Neural Information Processing SystemsJan-19-2025, 20:45:56 GMT

Pairwise GUI Dataset Construction Between Android Phones and Tablets

In the current landscape of pervasive smartphones and tablets, apps frequently exist across both platforms.Although apps share most graphic user interfaces (GUIs) and functionalities across phones and tablets, developers often rebuild from scratch for tablet versions, escalating costs and squandering existing design resources.Researchers are attempting to collect data and employ deep learning in automated GUIs development to enhance developers' productivity.There are currently several publicly accessible GUI page datasets for phones, but none for pairwise GUIs between phones and tablets.This poses a significant barrier to the employment of deep learning in automated GUI development.In this paper, we introduce the Papt dataset, a pioneering pairwise GUI dataset tailored for Android phones and tablets, encompassing 10,035 phone-tablet GUI page pairs sourced from 5,593 unique app pairs.We propose novel pairwise GUI collection approaches for constructing this dataset and delineate its advantages over currently prevailing datasets in the field.Through preliminary experiments on this dataset, we analyze the present challenges of utilizing deep learning in automated GUI development.

android phone and tablet, deep learning, pairwise gui dataset construction, (2 more...)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.79)

arXiv.org Artificial IntelligenceMay-14-2024

Impact of Design Decisions in Scanpath Modeling

Emami, Parvin, Jiang, Yue, Guo, Zixin, Leiva, Luis A.

Modeling visual saliency in graphical user interfaces (GUIs) allows to understand how people perceive GUI designs and what elements attract their attention. One aspect that is often overlooked is the fact that computational models depend on a series of design parameters that are not straightforward to decide. We systematically analyze how different design parameters affect scanpath evaluation metrics using a state-of-the-art computational model (DeepGaze++). We particularly focus on three design parameters: input image size, inhibition-of-return decay, and masking radius. We show that even small variations of these design parameters have a noticeable impact on standard evaluation metrics such as DTW or Eyenalysis. These effects also occur in other scanpath models, such as UMSS and ScanGAN, and in other datasets such as MASSVIS. Taken together, our results put forward the impact of design decisions for predicting users' viewing behavior on GUIs.

design parameter, fixation point, scanpath model, (16 more...)