Leveraging human Domain Knowledge to model an empirical Reward function for a Reinforcement Learning problem