Learning Non-myopic Power Allocation in Constrained Scenarios