Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

Open in new window