How Good are Foundation Models in Step-by-Step Embodied Reasoning?

Open in new window