Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems