Towards Outcome-Oriented, Task-Agnostic Evaluation of AI Agents