Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation