Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models