SetLexSem Challenge: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models