Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?

Open in new window