Evaluating Large Language Models through Gender and Racial Stereotypes