Generalist Large Language Models Outperform Clinical Tools on Medical Benchmarks