Reinforcement Learning for Real Life Planning Problems

#artificialintelligence 

To avoid the paper being thrown in the bin we provide this with a large, negative reward, say -1, and because the teacher is please with it being placed in the bin this nets a large positive reward, 1. To avoid the outcome where it continually gets passed around the room, we set the reward for all other actions to be a small, negative value, say -0.04. If we set this as a positive or null number then the model may let the paper go round and round as it would be better to gain small positives than risk getting close to the negative outcome. This number is also very small as it will only collect a single terminal reward but it could take many steps to end the episode and we need to ensure that, if the paper is place in the bin, the positive outcome is not cancelled out. Please note, the rewards are always relative to one another and I have chosen arbitrary figures but these can be changed if the results are not as desired.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found