Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning