Execution-based Evaluation for Data Science Code Generation Models