Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Neural Information Processing Systems 

Autonomous agents that accomplish complex computer tasks with minimal human interventions can significantly enhance accessibility and productivity of humancomputer interactions. Existing benchmarks either lack interactive environments or are limited to specific applications/domains, failing to reflect the diversity and complexity of real-world computer use and limiting agent scalability.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found