How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study