Towards Leveraging Large Language Models for Automated Medical Q&A Evaluation