Humans Continue to Outperform Large Language Models in Complex Clinical Decision-Making: A Study with Medical Calculators