One ruler to measure them all: Benchmarking multilingual long-context language models