Beyond Description: Cognitively Benchmarking Fine-Grained Action for Embodied Agents