Large Language User Interfaces: Voice Interactive User Interfaces powered by LLMs

Wasti, Syed Mekael, Pu, Ken Q., Neshati, Ali

arXiv.org Artificial Intelligence 

The modern world relies on and is driven by software. Embedded systems, command-line interfaces and user interface (UI) software are present across systems all around the world. The ease of use coupled with their intuitive nature has allowed for UI systems to become a staple as a crucial tool in modern software and beyond. UI systems serve as a visually appealing packaging of function calls and event handlers, allowing for complex event pipelines and data flows to be abstracted by buttons, text fields, menus, etc. The evolutions made in large language models (LLMs) over the past year have exhibited true "cognitive" potential. This potent ability has unveiled innumerable new opportunities to revolutionize the way our contemporary software systems are expected to operate. In this paper, we explore our vision and progress toward developing a UI architectural paradigm which employs a multimodal engine powered by LLMs and state-of-the-art transformer models. This framework aims to abstract monotonous UI interactions with prompting mechanisms that serve as "cognitively aware", powering automated functional calling and data flow pipelines, which translate to full speech-based intelligence control over visual UI systems.