StreamBench: Towards Benchmarking Continuous Improvement of Language Agents

Neural Information Processing Systems 

To address this gap, we introduce StreamBench, a pioneering benchmark designed to evaluate the continuous improvement of LLM agents over an input-feedback sequence.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found