Neural Inventory Control in Networks via Hindsight Differentiable Policy Optimization