Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs