Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management

Open in new window