Learning an Optimal Assortment Policy under Observational Data