How Model Size, Temperature, and Prompt Style Affect LLM-Human Assessment Score Alignment