FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs?