Recommending the optimal policy by learning to act from temporal data