Reinforcement Learning in POMDP's via Direct Gradient Ascent