THELMA: Task Based Holistic Evaluation of Large Language Model Applications-RAG Question Answering