Uncovering Systematic Failures of LLMs in Verifying Code Against Natural Language Specifications

Open in new window