Evaluating Long-Term Memory for Long-Context Question Answering