DarkStream: real-time speech anonymization with low latency

Sep-8-2025–arXiv.org Artificial Intelligence

Abstract--We propose DarkStream, a streaming speech synthesis model for real-time speaker anonymization. T o improve content encoding under strict latency constraints, DarkStream combines a causal waveform encoder, a short lookahead buffer, and transformer-based contextual layers. T o further reduce inference time, the model generates waveforms directly via a neural vocoder, thus removing intermediate mel-spectrogram conversions. Evaluations show our model achieves strong anonymization, yielding close to 50% speaker verification EER (near-chance performance) on the lazy-informed attack scenario, while maintaining acceptable linguistic intelligibility (WER within 9%). By balancing low-latency, robust privacy, and minimal intelligibility degradation, DarkStream provides a practical solution for privacy-preserving real-time speech communication. V oice recordings contain rich biometric information that reveals not only linguistic content but also personal attributes such as speaker identity, sex, and age, as well as paralin-guistics (dialect/accent, emotions). Such sensitive information can be exploited by adversaries for speaker recognition and profiling, raising significant privacy concerns.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Sep-8-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report (0.82)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Artificial Intelligence
    - Speech > Speech Recognition (1.00)
    - Natural Language (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found