Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models

Open in new window