Review for NeurIPS paper: Inferring learning rules from animal decision-making

Neural Information Processing Systems 

Weaknesses: As it is pointed out by the author (line 148-151), the result strongly relies on the correct assumption of the learning model to be REINFORCE, which I think it's a very strong assumption. It would be better supported by literature, showing animals can/are doing similar learning. Also as the authors pointed out, their model is descriptive. As the nature of a descriptive model, I feel like I don't gain much insight from the model of how animals learn. For example, the authors found a non-zero update to the bias weight on incorrect trial, which explains the "incorrect" bahevior of repeatedly choosing the wrong option. This sounds like a "noise" in the behavior to me and the model also does not explain it further besides it being noise.