Benchmarking the Spectrum of Agent Capabilities