StreamBench: Towards Benchmarking Continuous Improvement of Language Agents
–Neural Information Processing Systems
To address this gap, we introduce StreamBench, a pioneering benchmark designed to evaluate the continuous improvement of LLM agents over an input-feedback sequence.
Neural Information Processing Systems
Oct-10-2025, 15:38:59 GMT