Goto

Collaborating Authors

 encoder


Align Y our Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

Neural Information Processing Systems

TPT does not explicitly align the pre-trained CLIP to become aware of the test sample distribution. For the effective test-time adaptation of V -L foundation models, it is crucial to bridge the distribution gap between the pre-training dataset and the downstream evaluation set for high zero-shot generalization.









P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting Sungwon Kim 1,2, Kevin J Shih

Neural Information Processing Systems

Our work proposes P-Flow, a fast and data-efficient zero-shot TTS model that uses speech prompts for speaker adaptation. P-Flow comprises a speech-prompted text encoder for speaker adaptation and a flow matching generative decoder for high-quality and fast speech synthesis.