How Model Size, Temperature, and Prompt Style Affect LLM-Human Assessment Score Alignment

Open in new window