XIFBench: Evaluating Large Language Models on Multilingual Instruction Following

Neural Information Processing Systems 

Large Language Models (LLMs) have demonstrated remarkable instruction-following capabilities across various applications. However, their performance in multilingual settings lacks systematic investigation, with existing evaluations lacking fine-grained constraint analysis across diverse linguistic contexts.