Copilot Arena: A platform for code
Copilot Arena is a VSCode extension that collects human preferences of code directly from developers. As model capabilities improve, large language models (LLMs) are increasingly integrated into user environments and workflows. In particular, software developers code with LLM-powered tools in integrated development environments such as VS Code, IntelliJ, or Eclipse. While these tools are increasingly used in practice, current LLM evaluations struggle to capture how users interact with these tools in real environments, as they are often limited to short user studies, only consider simple programming tasks as opposed to real-world systems, or rely on web-based platforms removed from development environments. To address these limitations, we introduce Copilot Arena, an app designed to evaluate LLMs in real-world settings by collecting preferences directly in a developer's actual workflow.
Apr-28-2025, 08:24:34 GMT