The AI Consumer Index (ACE)
Benchek, Julien, Shetty, Rohit, Hunsberger, Benjamin, Arun, Ajay, Richards, Zach, Foody, Brendan, Nitski, Osvald, Vidgen, Bertie
–arXiv.org Artificial Intelligence
We introduce the first version of the AI Consumer Index (ACE), a benchmark for assessing whether frontier AI models can perform everyday consumer tasks. ACE contains a hidden heldout set of 400 test cases, split across four consumer activities: shopping, food, gaming, and DIY. We are also open sourcing 80 cases as a devset with a CC-BY license. For the ACE leaderboard we evaluated 10 frontier models (with websearch turned on) using a novel grading methodology that dynamically checks whether relevant parts of the response are grounded in the retrieved web sources. GPT 5 (Thinking = High) is the top-performing model, scoring 56.1%, followed by o3 Pro (Thinking = On) at 55.2% and GPT 5.1 (Thinking = High) at 55.1%. Model scores differ across domains, and in Shopping the top model scores under 50\%. We find that models are prone to hallucinating key information, such as prices. ACE shows a substantial gap between the performance of even the best models and consumers' AI needs.
arXiv.org Artificial Intelligence
Dec-10-2025
- Genre:
- Research Report (0.50)
- Workflow (0.48)
- Industry:
- Banking & Finance (0.46)
- Health & Medicine (0.68)
- Information Technology (0.46)
- Leisure & Entertainment > Games
- Computer Games (0.46)
- Technology: