RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering