StreamBench: Towards Benchmarking Continuous Improvement of Language Agents