Internal Model from Observations for Reward Shaping