Goto

Collaborating Authors

 Asia




ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

Neural Information Processing Systems

We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models and optimized data-efficiently for spoken language tasks.





IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos

Neural Information Processing Systems

While significant progress has been made in developing autonomous agents for shape assembly, existing datasets have not yet tackled the 4D grounding of assembly instructions in videos, essential for a holistic understanding of assembly in 3D space over time.