Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto Scaling