'No' Matters: Out-of-Distribution Detection in Multimodality Long Dialogue