Teaching machines to reason about what they see