Recurrent Sum-Product-Max Networks for Decision Making in Perfectly-Observed Environments