Goto

Collaborating Authors

 task automation


TaskBench: Benchmarking Large Language Models for Task Automation

Neural Information Processing Systems

In recent years, the remarkable progress of large language models (LLMs) has sparked interest in task automation, which involves decomposing complex tasks described by user instructions into sub-tasks and invoking external tools to execute them, playing a central role in autonomous agents. However, there is a lack of systematic and standardized benchmarks to promote the development of LLMs in task automation. To address this, we introduce TaskBench, a comprehensive framework to evaluate the capability of LLMs in task automation. Specifically, task automation can be divided into three critical stages: task decomposition, tool selection, and parameter prediction. To tackle the complexities inherent in these stages, we introduce the concept of Tool Graph to represent decomposed tasks and adopt a back-instruct method to generate high-quality user instructions.




TaskBench: Benchmarking Large Language Models for Task Automation

Neural Information Processing Systems

In recent years, the remarkable progress of large language models (LLMs) has sparked interest in task automation, which involves decomposing complex tasks described by user instructions into sub-tasks and invoking external tools to execute them, playing a central role in autonomous agents. However, there is a lack of systematic and standardized benchmarks to promote the development of LLMs in task automation. To address this, we introduce TaskBench, a comprehensive framework to evaluate the capability of LLMs in task automation. Specifically, task automation can be divided into three critical stages: task decomposition, tool selection, and parameter prediction. To tackle the complexities inherent in these stages, we introduce the concept of Tool Graph to represent decomposed tasks and adopt a back-instruct method to generate high-quality user instructions.


TaskBench: Benchmarking Large Language Models for Task Automation

Shen, Yongliang, Song, Kaitao, Tan, Xu, Zhang, Wenqi, Ren, Kan, Yuan, Siyu, Lu, Weiming, Li, Dongsheng, Zhuang, Yueting

arXiv.org Artificial Intelligence

Recently, the incredible progress of large language models (LLMs) has ignited the spark of task automation, which decomposes the complex tasks described by user instructions into sub-tasks, and invokes external tools to execute them, and plays a central role in autonomous agents. However, there lacks a systematic and standardized benchmark to foster the development of LLMs in task automation. To this end, we introduce TaskBench to evaluate the capability of LLMs in task automation. Specifically, task automation can be formulated into three critical stages: task decomposition, tool invocation, and parameter prediction to fulfill user intent. This complexity makes data collection and evaluation more challenging compared to common NLP tasks. To generate high-quality evaluation datasets, we introduce the concept of Tool Graph to represent the decomposed tasks in user intent, and adopt a back-instruct method to simulate user instruction and annotations. Furthermore, we propose TaskEval to evaluate the capability of LLMs from different aspects, including task decomposition, tool invocation, and parameter prediction. Experimental results demonstrate that TaskBench can effectively reflects the capability of LLMs in task automation. Benefiting from the mixture of automated data construction and human verification, TaskBench achieves a high consistency compared to the human evaluation, which can be utilized as a comprehensive and faithful benchmark for LLM-based autonomous agents.


Responsible Task Automation: Empowering Large Language Models as Responsible Task Automators

Zhang, Zhizheng, Zhang, Xiaoyi, Xie, Wenxuan, Lu, Yan

arXiv.org Artificial Intelligence

They have shown a promising prospect in automatically completing tasks upon user instructions, functioning as brain-like coordinators. The associated risks will be revealed as we delegate an increasing number of tasks to machines for automated completion. A big question emerges: how can we make machines behave responsibly when helping humans automate tasks as personal copilots? In this paper, we explore this question in depth from the perspectives of feasibility, completeness and security. In specific, we present Responsible Task Automation (ResponsibleTA) as a fundamental framework to facilitate responsible collaboration between LLM-based coordinators and executors for task automation with three empowered capabilities: 1) predicting the feasibility of the commands for executors; 2) verifying the completeness of executors; 3) enhancing the security (e.g., the protection of users' privacy). We further propose and compare two paradigms for implementing the first two capabilities. One is to leverage the generic knowledge of LLMs themselves via prompt engineering while the other is to adopt domain-specific learnable models. Moreover, we introduce a local memory mechanism for achieving the third capability. We evaluate our proposed ResponsibleTA on UI task automation and hope it could bring more attentions to ensuring LLMs more responsible in diverse scenarios.


Empowering LLM to use Smartphone for Intelligent Task Automation

Wen, Hao, Li, Yuanchun, Liu, Guohong, Zhao, Shanhui, Yu, Tao, Li, Toby Jia-Jun, Jiang, Shiqi, Liu, Yunhao, Zhang, Yaqin, Liu, Yunxin

arXiv.org Artificial Intelligence

Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the limited language understanding ability and the non-trivial manual efforts required from developers or end-users. The recent advance of large language models (LLMs) in language understanding and reasoning inspires us to rethink the problem from a model-centric perspective, where task preparation, comprehension, and execution are handled by a unified language model. In this work, we introduce AutoDroid, a mobile task automation system that can handle arbitrary tasks on any Android application without manual efforts. The key insight is to combine the commonsense knowledge of LLMs and domain-specific knowledge of apps through automated dynamic analysis. The main components include a functionality-aware UI representation method that bridges the UI with the LLM, exploration-based memory injection techniques that augment the app-specific domain knowledge of LLM, and a multi-granularity query optimization module that reduces the cost of model inference. We integrate AutoDroid with off-the-shelf LLMs including online GPT-4/GPT-3.5 and on-device Vicuna, and evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks. The results demonstrated that AutoDroid is able to precisely generate actions with an accuracy of 90.9%, and complete tasks with a success rate of 71.3%, outperforming the GPT-4-powered baselines by 36.4% and 39.7%. The demo, benchmark suites, and source code of AutoDroid will be released at url{https://autodroid-sys.github.io/}.


How Artificial Intelligence is transforming the industry

#artificialintelligence

Artificial intelligence is transforming industry in many ways. From task automation to decision making, the way we work and interact with the world. In this article, we will explore how AI is being used in different industries and how it is driving efficiency and productivity. Artificial intelligence is changing the way we work in several ways. First -- AI is being used to automate tasks and processes.


How AI is Improving Cloud Computing for Enterprises - ONLINE LIKE

#artificialintelligence

The first two decades of the 21st century have been marked by exponential advances in technology that were once considered elements of a science fiction movie script. Technologies like Artificial intelligence (AI) and Cloud Computing--have stood the test of time and have become mainstream. In this article, we'll look at what these technologies are and how their combination has been a landscape-changing force in the world of modern technology. Simply put, artificial intelligence is the simulation of human intelligence by machines. The integration of artificial intelligence into business allows it to perceive and observe the environment and generate optimal results accordingly--very similar to how people operate, although much faster.


The Future Of AI Process Automation In Marketing

#artificialintelligence

In the past several years, marketers have embraced artificial intelligence technologies to automate a broad range of high-volume, data-intensive tasks from ad targeting to image manipulation. The next phase of AI in marketing has the potential to deliver a much larger impact as the focus shifts from the automation of single tasks to more complex business processes and workflows, and ultimately influencing marketing strategy. Task automation using AI will continue to add value to marketers, but their benefits will be dwarfed by the intelligent automation of complex workflows. To understand the enormous difference between task automation and process automation, consider the evolution of automotive interfaces. In the early 2000s, we started to see basic voice automation in cars.