Multi-Action Dialog Policy Learning from Logged User Feedback