OPENCUA: Open Foundations for Computer-Use Agents
–Neural Information Processing Systems
Vision-language models have demonstrated impressive capabilities as computer-use agents (CUAs) capable of automating diverse computer tasks. As their commercial potential grows, critical details of the most capable CUA systems remain closed. As these agents will increasingly mediate digital interactions and execute consequential decisions on our behalf, the research community needs access to open CUA frameworks to study their capabilities, limitations, and risks. To bridge this gap, we propose OPENCUA, a comprehensive open-source framework for scaling CUA data and foundation models. Our framework consists of: (1) an annotation infrastructure that seamlessly captures human computer-use demonstrations; (2) AGENTNET, the first large-scale computer-use task dataset spanning 3 operating systems and 200+ applications and websites; (3) a scalable pipeline that transforms demonstrations into state-action pairs with reflective long Chain-of-Thought reasoning that sustain robust performance gains as data scales.
Neural Information Processing Systems
Jun-22-2026, 19:13:06 GMT
- Genre:
- Workflow (1.00)
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Industry:
- Information Technology
- Security & Privacy (1.00)
- Services (0.68)
- Information Technology
- Technology:
- Information Technology
- Software (1.00)
- Information Management > Search (1.00)
- Communications > Social Media (1.00)
- Artificial Intelligence
- Natural Language > Large Language Model (1.00)
- Vision (0.87)
- Machine Learning > Neural Networks (0.68)
- Representation & Reasoning > Agents (0.67)
- Cognitive Science > Problem Solving (0.67)
- Information Technology