Interactive Evaluation of Large Language Models for Multi-Requirement Software Engineering Tasks