Actor-Critic based Online Data Mixing For Language Model Pre-Training