Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress