INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models