StreamBench: Towards Benchmarking Continuous Improvement of Language Agents
–Neural Information Processing Systems
To address this gap, we introduce StreamBench, a pioneering benchmark designed to evaluate the continuous improvement of LLM agents over an input-feedback sequence.
Neural Information Processing Systems
Feb-17-2026, 23:01:39 GMT