Evalet: Evaluating Large Language Models by Fragmenting Outputs into Functions

Open in new window