Can LLMs reason over extended multilingual contexts? Towards long-context evaluation beyond retrieval and haystacks

Open in new window