NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls

Open in new window