ui component
- Oceania > Australia > New South Wales > Sydney (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > Dominican Republic (0.04)
- Asia > Taiwan > Takao Province > Kaohsiung (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > Dominican Republic (0.04)
- Asia > Taiwan > Takao Province > Kaohsiung (0.04)
LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation
Zhang, Li, Wang, Shihe, Jia, Xianqing, Zheng, Zhihan, Yan, Yunhe, Gao, Longxi, Li, Yuanchun, Xu, Mengwei
The emergent large language/multimodal models facilitate the evolution of mobile agents, especially in the task of mobile UI automation. However, existing evaluation approaches, which rely on human validation or established datasets to compare agent-predicted actions with predefined ones, are unscalable and unfaithful. To overcome these limitations, this paper presents LlamaTouch, a testbed for on-device agent execution and faithful, scalable agent evaluation. By observing that the task execution process only transfers UI states, LlamaTouch employs a novel evaluation approach that only assesses whether an agent traverses all manually annotated, essential application/system states. LlamaTouch comprises three key techniques: (1) On-device task execution that enables mobile agents to interact with real mobile environments for task completion. (2) Fine-grained UI component annotation that merges pixel-level screenshots and textual screen hierarchies to explicitly identify and precisely annotate essential UI components with a rich set of designed annotation primitives. (3) A multi-level state matching algorithm that utilizes exact and fuzzy matching to accurately detect critical information in each screen with unpredictable UI layout/content dynamics. LlamaTouch currently incorporates four mobile agents and 495 UI automation tasks, encompassing both tasks in the widely-used datasets and our self-constructed ones for more diverse mobile applications. Evaluation results demonstrate the LlamaTouch's high faithfulness of evaluation in real environments and its better scalability than human validation. LlamaTouch also enables easy task annotation and integration of new mobile agents. Code and dataset are publicly available at https://github.com/LlamaTouch/LlamaTouch.
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Information Technology > Communications > Mobile (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs
Wasti, Syed Mekael, Pu, Ken Q., Neshati, Ali
The modern world relies on and is driven by software. Embedded systems, command-line interfaces and user interface (UI) software are present across systems all around the world. The ease of use coupled with their intuitive nature has allowed for UI systems to become a staple as a crucial tool in modern software and beyond. UI systems serve as a visually appealing packaging of function calls and event handlers, allowing for complex event pipelines and data flows to be abstracted by buttons, text fields, menus, etc. The evolutions made in large language models (LLMs) over the past year have exhibited true "cognitive" potential. This potent ability has unveiled innumerable new opportunities to revolutionize the way our contemporary software systems are expected to operate. In this paper, we explore our vision and progress toward developing a UI architectural paradigm which employs a multimodal engine powered by LLMs and state-of-the-art transformer models. This framework aims to abstract monotonous UI interactions with prompting mechanisms that serve as "cognitively aware", powering automated functional calling and data flow pipelines, which translate to full speech-based intelligence control over visual UI systems.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > Ontario (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- (2 more...)
Development of a Legal Document AI-Chatbot
Devaraj, Pranav Nataraj, P, Rakesh Teja V, Gangrade, Aaryav, R, Manoj Kumar
With the exponential growth of digital data and the increasing complexity of legal documentation, there is a pressing need for efficient and intelligent tools to streamline the handling of legal documents.With the recent developments in the AI field, especially in chatbots, it cannot be ignored as a very compelling solution to this problem.An insight into the process of creating a Legal Documentation AI Chatbot with as many relevant features as possible within the given time frame is presented.The development of each component of the chatbot is presented in detail.Each component's workings and functionality has been discussed.Starting from the build of the Android app and the Langchain query processing code till the integration of both through a Flask backend and REST API methods.
Pairwise GUI Dataset Construction Between Android Phones and Tablets
Hu, Han, Zhan, Haolan, Huang, Yujin, Liu, Di
In the current landscape of pervasive smartphones and tablets, apps frequently exist across both platforms. Although apps share most graphic user interfaces (GUIs) and functionalities across phones and tablets, developers often rebuild from scratch for tablet versions, escalating costs and squandering existing design resources. Researchers are attempting to collect data and employ deep learning in automated GUIs development to enhance developers' productivity. There are currently several publicly accessible GUI page datasets for phones, but none for pairwise GUIs between phones and tablets. This poses a significant barrier to the employment of deep learning in automated GUI development. In this paper, we introduce the Papt dataset, a pioneering pairwise GUI dataset tailored for Android phones and tablets, encompassing 10,035 phone-tablet GUI page pairs sourced from 5,593 unique app pairs. We propose novel pairwise GUI collection approaches for constructing this dataset and delineate its advantages over currently prevailing datasets in the field. Through preliminary experiments on this dataset, we analyze the present challenges of utilizing deep learning in automated GUI development.
- Oceania > Australia > New South Wales > Sydney (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > Dominican Republic (0.04)
- Asia > Taiwan > Takao Province > Kaohsiung (0.04)
- Information Technology > Human Computer Interaction > Interfaces (1.00)
- Information Technology > Graphics (1.00)
- Information Technology > Communications > Mobile (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
UI Layers Merger: Merging UI layers via Visual Learning and Boundary Prior
Chen, Yun-nong, Zhen, Yan-kun, Shi, Chu-ning, Li, Jia-zhi, Chen, Liu-qing, Li, Ze-jian, Sun, Ling-yun, Zhou, Ting-ting, Chang, Yan-fang
With the fast-growing GUI development workload in the Internet industry, some work on intelligent methods attempted to generate maintainable front-end code from UI screenshots. It can be more suitable for utilizing UI design drafts that contain UI metadata. However, fragmented layers inevitably appear in the UI design drafts which greatly reduces the quality of code generation. None of the existing GUI automated techniques detects and merges the fragmented layers to improve the accessibility of generated code. In this paper, we propose UI Layers Merger (UILM), a vision-based method, which can automatically detect and merge fragmented layers into UI components. Our UILM contains Merging Area Detector (MAD) and a layers merging algorithm. MAD incorporates the boundary prior knowledge to accurately detect the boundaries of UI components. Then, the layers merging algorithm can search out the associated layers within the components' boundaries and merge them into a whole part. We present a dynamic data augmentation approach to boost the performance of MAD. We also construct a large-scale UI dataset for training the MAD and testing the performance of UILM. The experiment shows that the proposed method outperforms the best baseline regarding merging area detection and achieves a decent accuracy regarding layers merging.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (16 more...)
UIBert: Learning Generic Multimodal Representations for UI Understanding
Bai, Chongyang, Zang, Xiaoxue, Xu, Ying, Sunkara, Srinivas, Rastogi, Abhinav, Chen, Jindong, Arcas, Blaise Aguera y
To improve the accessibility of smart devices and to simplify their usage, building models which understand user interfaces (UIs) and assist users to complete their tasks is critical. However, unique challenges are proposed by UI-specific characteristics, such as how to effectively leverage multimodal UI features that involve image, text, and structural metadata and how to achieve good performance when high-quality labeled data is unavailable. To address such challenges we introduce UIBert, a transformer-based joint image-text model trained through novel pre-training tasks on large-scale unlabeled UI data to learn generic feature representations for a UI and its components. Our key intuition is that the heterogeneous features in a UI are self-aligned, i.e., the image and text features of UI components, are predictive of each other. We propose five pretraining tasks utilizing this self-alignment among different features of a UI component and across various components in the same UI. We evaluate our method on nine real-world downstream UI tasks where UIBert outperforms strong multimodal baselines by up to 9.26% accuracy.