Evaluating Large Language Models for Financial Reasoning: A CFA-Based Benchmark Study