tinyBenchmarks: evaluating LLMs with fewer examples

Open in new window