Beyond Fixed Morphologies: Learning Graph Policies with Trust Region Compensation in Variable Action Spaces