Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning