ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric Videos