Integrated architectures for learning, planning, and reacting based on approximating dynamic programming