StreamAtt: Direct Streaming Speech-to-Text Translation with Attention-based Audio History Selection

Papi, Sara, Gaido, Marco, Negri, Matteo, Bentivogli, Luisa

Jun-10-2024–arXiv.org Artificial Intelligence

Streaming speech-to-text translation (StreamST) is the task of automatically translating speech while incrementally receiving an audio stream. Unlike simultaneous ST (SimulST), which deals with pre-segmented speech, StreamST faces the challenges of handling continuous and unbounded audio streams. This requires additional decisions about what to retain of the previous history, which is impractical to keep entirely due to latency and computational constraints. Despite the real-world demand for real-time ST, research on streaming translation remains limited, with existing works solely focusing on SimulST. To fill this gap, we introduce StreamAtt, the first StreamST policy, and propose StreamLAAL, the first StreamST latency metric designed to be comparable with existing metrics for SimulST. Extensive experiments across all 8 languages of MuST-C v1.0 show the effectiveness of StreamAtt compared to a naive streaming baseline and the related state-of-the-art SimulST policy, providing a first step in StreamST research.

computational linguistic, proceedings, translation, (12 more...)

arXiv.org Artificial Intelligence

Jun-10-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Nevada (0.04)
    - Maryland (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
    - California > San Diego County
      - San Diego (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Italy > Trentino-Alto Adige/Südtirol
    - Trentino Province > Trento (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.05)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Singapore (0.04)
  - China (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language > Machine Translation (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found