Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language