Diversity as a Reward: Fine-Tuning LLMs on a Mixture of Domain-Undetermined Data