Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding