What Level of Automation is "Good Enough"? A Benchmark of Large Language Models for Meta-Analysis Data Extraction