A Dataset for Evaluating LLM-based Evaluation Functions for Research Question Extraction Task