narasimhan
Beyond Accuracy: A Multi-Dimensional Framework for Evaluating Enterprise Agentic AI Systems
Current agentic AI benchmarks predominantly evaluate task completion accuracy, while overlooking critical enterprise requirements such as cost-efficiency, reliability, and operational stability. Through systematic analysis of 12 main benchmarks and empirical evaluation of state-of-the-art agents, we identify three fundamental limitations: (1) absence of cost-controlled evaluation leading to 50x cost variations for similar precision, (2) inadequate reliability assessment where agent performance drops from 60\% (single run) to 25\% (8-run consistency), and (3) missing multidimensional metrics for security, latency, and policy compliance. We propose \textbf{CLEAR} (Cost, Latency, Efficacy, Assurance, Reliability), a holistic evaluation framework specifically designed for enterprise deployment. Evaluation of six leading agents on 300 enterprise tasks demonstrates that optimizing for accuracy alone yields agents 4.4-10.8x more expensive than cost-aware alternatives with comparable performance. Expert evaluation (N=15) confirms that CLEAR better predicts production success (correlation $ρ=0.83$) compared to accuracy-only evaluation ($ρ=0.41$).
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Leisure & Entertainment > Games (0.68)
- Education > Educational Setting (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
- North America > United States (0.28)
- Asia > India (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > Canada (0.04)
- North America > United States (0.28)
- Asia > India (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > Canada (0.04)
We first address the common concerns raised
We greatly appreciate the reviewers for taking a close look at the paper and the proofs, and giving a detailed feedback. We first address the common concerns raised. This is the same dependence that the previous method of Narasimhan (2018) incurs [26]. We'll definitely include this in the appendix. We'll include these details in the paper, along with an example.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Leisure & Entertainment > Sports (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Education (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Leisure & Entertainment > Sports (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Education (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Leisure & Entertainment > Games (0.68)
- Education > Educational Setting (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
- North America > United States (0.28)
- Asia > India (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > Canada (0.04)
- North America > United States (0.28)
- Asia > India (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)