Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function

Open in new window