Language models are not naysayers: An analysis of language models on negation benchmarks