HARP: A challenging human-annotated math reasoning benchmark