cab ench
CABENCH: Benchmarking Composable AI for Solving Complex Tasks through Composing Ready-to-Use Models
Pham, Tung-Thuy, Luong, Duy-Quan, Duong, Minh-Quan, Nguyen, Trung-Hieu, Nguyen, Thu-Trang, Nguyen, Son, Vo, Hieu Dinh
Composable AI offers a scalable and effective paradigm for tackling complex AI tasks by decomposing them into sub-tasks and solving each sub-task using ready-to-use well-trained models. However, systematically evaluating methods under this setting remains largely unexplored. In this paper, we introduce CABENCH, the first public benchmark comprising 70 realistic composable AI tasks, along with a curated pool of 700 models across multiple modalities and domains. We also propose an evaluation framework to enable end-to-end assessment of composable AI solutions. To establish initial baselines, we provide human-designed reference solutions and compare their performance with two LLM-based approaches. Our results illustrate the promise of composable AI in addressing complex real-world problems while highlighting the need for methods that can fully unlock its potential by automatically generating effective execution pipelines.
- Overview (0.68)
- Research Report > New Finding (0.48)
- Health & Medicine (0.68)
- Information Technology (0.68)