Control Synthesis from Linear Temporal Logic Specifications using Model-Free Reinforcement Learning
Bozkurt, Alper Kamil, Wang, Yu, Zavlanos, Michael M., Pajic, Miroslav
–arXiv.org Artificial Intelligence
Arrows: actions top, left, down, and right; encircled characters: state labels. The actions in states that are not reachable or lead to another LDBA state are not displayed. In all subfigures, the most likely paths are highlighted in red. the baby b, the only allowed action is left and when taken the following situations can happen: (i) the robot hits the wall with probability 0.1 and wakes the baby up; (ii) the robot moves left with probability 0. 8 or moves down with probability 0.1 . If the baby has been woken up, which means the robot could not leave in a single time step (represented by L TL as b null b), the robot should notify the adult (at state a); otherwise, the robot should directly go back to the charger (at state c). The full objective is specified in L TL as ϕ 2 nullnull d nullnullnullnull (1) (b null b) null ( b U (a c)) null nullnull null (2) a null ( a U b) null nullnull null (3) ( b null b nullnull b) ( a U c) null nullnull null (4) c ( a U b) null nullnull null (5) (b null b) a null nullnull null (6) null .
arXiv.org Artificial Intelligence
Sep-16-2019
- Country:
- North America > United States (0.29)
- Asia > Middle East (0.28)
- Genre:
- Research Report (0.40)
- Technology: