WILT: A Multi-Turn, Memorization-Robust Inductive Logic Benchmark for LLMs

Open in new window