User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue