Benchmarks, Test Beds, Controlled Experimentation, and the Design of Agent Architectures