LongViTU: Instruction Tuning for Long-Form Video Understanding