Instruction-Tuned Video-Audio Models Elucidate Functional Specialization in the Brain