[Q] Temporal Difference Learning in POMDP's • /r/MachineLearning
The environment is partially observable and will never be fully observable, due to a lack of information. Does anyone know of any models suitable for learning such a value function?
Jun-10-2016, 15:08:35 GMT