HammerBench: Fine-Grained Function-Calling Evaluation in Real Mobile Device Scenarios