Learning Human Activities and Object Affordances from RGB-D Videos