Exploring System Adaptations For Minimum Latency Real-Time Piano Transcription

Hu, Patricia, Peter, Silvan David, Schlüter, Jan, Widmer, Gerhard

Sep-10-2025–arXiv.org Artificial Intelligence

Advances in neural network design and the availability of large-scale labeled datasets have driven major improvements in piano transcription. Existing approaches target either offline applications, with no restrictions on computational demands, or online transcription, with delays of 128-320 ms. However, most real-time musical applications require latencies below 30 ms. In this work, we investigate whether and how the current state-of-the-art online transcription model can be adapted for real-time piano transcription. Specifically, we eliminate all non-causal processing, and reduce computational load through shared computations across core model components and variations in model size. Additionally, we explore different pre- and postprocessing strategies, and related label encoding schemes, and discuss their suitability for real-time transcription. Evaluating the adaptions on the MAESTRO dataset, we find a drop in transcription accuracy due to strictly causal processing as well as a tradeoff between the preprocessing latency and prediction accuracy. We release our system as a baseline to support researchers in designing models towards minimum latency real-time transcription.

artificial intelligence, machine learning, real time system, (18 more...)

arXiv.org Artificial Intelligence

Sep-10-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Austria (0.28)
- North America > United States (0.28)

Genre:
- Research Report (1.00)

Industry:
- Media > Music (1.00)
- Leisure & Entertainment (1.00)

Technology:
- Information Technology
  - Architecture > Real Time Systems (1.00)
  - Artificial Intelligence > Machine Learning
    - Neural Networks (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found