Learning Branching Policies for MILPs with Proximal Policy Optimization