How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human Comparison