On the Multi-turn Instruction Following for Conversational Web Agents