THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation