Benchmarking Large Language Models for Calculus Problem-Solving: A Comparative Analysis