Learning non-Markovian Decision-Making from State-only Sequences Aoyang Qin,1, 2 Feng Gao 3 Qing Li