waveformer
WaveFormer: A 3D Transformer with Wavelet-Driven Feature Representation for Efficient Medical Image Segmentation
Hasan, Md Mahfuz Al, Zaman, Mahdi, Jawad, Abdul, Santamaria-Pang, Alberto, Lee, Ho Hin, Tarapov, Ivan, See, Kyle, Imran, Md Shah, Roy, Antika, Fallah, Yaser Pourmohammadi, Asadizanjani, Navid, Forghani, Reza
Transformer-based architectures have advanced medical image analysis by effectively modeling long-range dependencies, yet they often struggle in 3D settings due to substantial memory overhead and insufficient capture of fine-grained local features. We address these limitations with WaveFormer, a novel 3D-transformer that: i) leverages the fundamental frequency-domain properties of features for contextual representation, and ii) is inspired by the top-down mechanism of the human visual recognition system, making it a biologically motivated architecture. By employing discrete wavelet transformations (DWT) at multiple scales, WaveFormer preserves both global context and high-frequency details while replacing heavy upsampling layers with efficient wavelet-based summarization and reconstruction. This significantly reduces the number of parameters, which is critical for real-world deployment where computational resources and training times are constrained. Furthermore, the model is generic and easily adaptable to diverse applications. Evaluations on BraTS2023, FLARE2021, and KiTS2023 demonstrate performance on par with state-of-the-art methods while offering substantially lower computational complexity. Keywords: Transformer Model Multi-level Attention Discrete Wavelet Transform 1 Introduction Medical image segmentation is fundamental to clinical applications such as tumor delineation, organ localization, and surgical planning. Deep learning-based approaches, particularly convolutional neural networks (CNNs), have demonstrated significant success by hierarchically extracting features. However, their limited receptive fields hinder the capture of long-range dependencies, a critical shortcoming in 3D applications where spatial context across distant slices arXiv:2503.23764v2
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- Europe > Switzerland (0.04)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.94)
CATSE: A Context-Aware Framework for Causal Target Sound Extraction
Baligar, Shrishail, Kegler, Mikolaj, Irvin, Bryce, Stamenovic, Marko, Newsam, Shawn
Target Sound Extraction (TSE) focuses on the problem of separating sources of interest, indicated by a user's cue, from the input mixture. Most existing solutions operate in an offline fashion and are not suited to the low-latency causal processing constraints imposed by applications in live-streamed content such as augmented hearing. We introduce a family of context-aware low-latency causal TSE models suitable for real-time processing. First, we explore the utility of context by providing the TSE model with oracle information about what sound classes make up the input mixture, where the objective of the model is to extract one or more sources of interest indicated by the user. Since the practical applications of oracle models are limited due to their assumptions, we introduce a composite multi-task training objective involving separation and classification losses. Our evaluation involving single- and multi-source extraction shows the benefit of using context information in the model either by means of providing full context or via the proposed multi-task training loss without the need for full context information. Specifically, we show that our proposed model outperforms size- and latency-matched Waveformer, a state-of-the-art model for real-time TSE.
Waveformer for modelling dynamical systems
Navaneeth, N, Chakraborty, Souvik
Neural operators have gained recognition as potent tools for learning solutions of a family of partial differential equations. The state-of-the-art neural operators excel at approximating the functional relationship between input functions and the solution space, potentially reducing computational costs and enabling real-time applications. However, they often fall short when tackling time-dependent problems, particularly in delivering accurate long-term predictions. In this work, we propose "waveformer", a novel operator learning approach for learning solutions of dynamical systems. The proposed waveformer exploits wavelet transform to capture the spatial multi-scale behavior of the solution field and transformers for capturing the long horizon dynamics. We present four numerical examples involving Burgers's equation, KS-equation, Allen Cahn equation, and Navier Stokes equation to illustrate the efficacy of the proposed approach. Results obtained indicate the capability of the proposed waveformer in learning the solution operator and show that the proposed Waveformer can learn the solution operator with high accuracy, outperforming existing state-of-the-art operator learning algorithms by up to an order, with its advantage particularly visible in the extrapolation region