TaskBench: BenchmarkingLargeLanguage ModelsforTaskAutomation

Neural Information Processing Systems 

To address this, we introduceTASKBENCH, a comprehensive framework to evaluate the capability of LLMs in task automation. Specifically, task automation can be divided into three critical stages: task decomposition, tool selection, and parameter prediction. To tackle the complexities inherent in these stages, we introduce the concept of Tool Graph to represent decomposed tasksandadoptaback-instruct method togenerate high-quality userinstructions. We propose TASKEVAL, a multi-faceted evaluation methodology that assesses LLMperformance across thesethreestages.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found