Periodic agent-state based Q-learning for POMDPs