ANIRA: An Architecture for Neural Network Inference in Real-Time Audio Applications

Ackva, Valentin, Schulz, Fares

arXiv.org Artificial Intelligence 

--Numerous tools for neural network inference are currently available, yet many do not meet the requirements of real-time audio applications. In response, we introduce anira, an efficient cross-platform library. T o ensure compatibility with a broad range of neural network architectures and frameworks, anira supports ONNX Runtime, LibT orch, and T ensorFlow Lite as backends. Each inference engine exhibits real-time violations, which anira mitigates by decoupling the inference from the audio callback to a static thread pool. The library incorporates built-in latency management and extensive benchmarking capabilities, both crucial to ensure a continuous signal flow. Three different neural network architectures for audio effect emulation are then subjected to benchmarking across various configurations. Statistical modeling is employed to identify the influence of various factors on performance. The findings indicate that for stateless models, ONNX Runtime exhibits the lowest runtimes. For stateful models, LibT orch demonstrates the fastest performance. Our results also indicate that for certain model-engine combinations, the initial inferences take longer, particularly when these inferences exhibit a higher incidence of real-time violations. In recent years, neural networks have become an integral part of modern audio digital signal processing. Their applications include audio classification [1], audio transcription [2], audio source separation [3], audio synthesis [4], [5], [6] and audio effects [7]. While offline processing is inherently supported, translating these architectures to real-time implementations remains challenging.