What to Say and When to Say it: Live Fitness Coaching as a Testbed for Situated Interaction Sunny Panchal 1 Guillaume Berger 1 Antoine Mercier 1
–Neural Information Processing Systems
Vision-language models have shown impressive progress in recent years. However, existing models are largely limited to turn-based interactions, where each turn must be stepped (i.e., prompted) by the user. Open-ended, asynchronous interactions, where an AI model may proactively deliver timely responses or feedback based on the unfolding situation in real-time, are an open challenge.
Neural Information Processing Systems
May-25-2025, 08:37:05 GMT
- Genre:
- Research Report (0.67)
- Industry:
- Health & Medicine > Consumer Health (0.93)
- Information Technology (0.67)
- Law (0.92)
- Technology: