Scalable Reinforcement Learning for Virtual Machine Scheduling