Proactive Detection of Voice Cloning with Localized Watermarking

Roman, Robin San, Fernandez, Pierre, Défossez, Alexandre, Furon, Teddy, Tran, Tuan, Elsahar, Hady

Jan-30-2024–arXiv.org Artificial Intelligence

In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning. We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level, and a novel perceptual loss inspired by auditory masking, that enables AudioSeal to achieve better imperceptibility. AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics. Additionally, AudioSeal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed - achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.

detection, detector, watermark, (13 more...)

arXiv.org Artificial Intelligence

Jan-30-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Maryland > Baltimore (0.04)
  - District of Columbia > Washington (0.04)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Speech (1.00)
    - Natural Language (1.00)
    - Machine Learning
      - Performance Analysis > Accuracy (1.00)
      - Neural Networks (1.00)