Goto

Collaborating Authors

 beamformer


Proactive Hearing Assistants that Isolate Egocentric Conversations

Hu, Guilin, Itani, Malek, Chen, Tuochao, Gollakota, Shyamnath

arXiv.org Artificial Intelligence

We introduce proactive hearing assistants that automatically identify and separate the wearer's conversation partners, without requiring explicit prompts. Our system operates on egocentric binaural audio and uses the wearer's self-speech as an anchor, leveraging turn-taking behavior and dialogue dynamics to infer conversational partners and suppress others. To enable real-time, on-device operation, we propose a dual-model architecture: a lightweight streaming model runs every 12.5 ms for low-latency extraction of the conversation partners, while a slower model runs less frequently to capture longer-range conversational dynamics. Results on real-world 2- and 3-speaker conversation test sets, collected with binaural egocentric hardware from 11 participants totaling 6.8 hours, show generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement. More information can be found on our website: https://proactivehearing.cs.washington.edu/


Online neural fusion of distortionless differential beamformers for robust speech enhancement

Qian, Yuanhang, Zhao, Kunlong, Jin, Jilu, Luo, Xueqin, Huang, Gongping, Chen, Jingdong, Benesty, Jacob

arXiv.org Artificial Intelligence

Fixed beamforming is widely used in practice since it does not depend on the estimation of noise statistics and provides relatively stable performance. However, a single beamformer cannot adapt to varying acoustic conditions, which limits its interference suppression capability. To address this, adaptive convex combination (ACC) algorithms have been introduced, where the outputs of multiple fixed beamformers are linearly combined to improve robustness. Nevertheless, ACC often fails in highly non-stationary scenarios, such as rapidly moving interference, since its adaptive updates cannot reliably track rapid changes. To overcome this limitation, we propose a frame-online neural fusion framework for multiple distortionless differential beamformers, which estimates the combination weights through a neural network. Compared with conventional ACC, the proposed method adapts more effectively to dynamic acoustic environments, achieving stronger interference suppression while maintaining the distortionless constraint.


SmartUT: Receive Beamforming for Spectral Coexistence of NGSO Satellite Systems

Saifaldawla, Almoatssimbillah, Lagunas, Eva, Ortiz, Flor, Adam, Abuzar B. M., Chatzinotas, Symeon

arXiv.org Artificial Intelligence

Abstract--In this paper, we investigate downlink co-frequency interference (CFI) mitigation in non-geostationary satellite orbits (NGSOs) co-existing systems. Traditional mitigation techniques, such as Zero-forcing (ZF), produce a null towards the direction of arrivals (DOAs) of the interfering signals, but they suffer from high computational complexity due to matrix inversions and required knowledge of the channel state information (CSI). Furthermore, adaptive beamformers, such as sample matrix inversion (SMI)-based minimum variance, provide poor performance when the available snapshots are limited. We propose a Mamba-based beamformer (MambaBF) that leverages an self-supervised deep learning (DL) approach and can be deployed on the user terminal (UT) antenna array, for assisting downlink beamforming and CFI mitigation using only a limited number of available array snapshots as input, and without CSI knowledge. I. INTRODUCTION Satellite communications (SatCom) will play a vital role in next-generation wireless networks by providing service to vast areas that lack terrestrial network coverage, especially with the rapidly growing Low-Earth orbit (LEO) mega-constellations [1].


Covariance Matrix Construction with Preprocessing-Based Spatial Sampling for Robust Adaptive Beamforming

Mohammadzadeh, Saeed, de Lamare, Rodrigo C., Zakharov, Yuriy

arXiv.org Artificial Intelligence

Abstract--This work proposes an efficient, robust adaptive beamforming technique to deal with steering vector (SV) est ima-tion mismatches and data covariance matrix reconstruction problems. In particular, the direction-of-arrival(DoA) of int erfering sources is estimated with available snapshots in which the a ngular sectors of the interfering signals are computed adaptively . Then, we utilize the well-known general linear combination algor ithm to reconstruct the interference-plus-noise covariance (I PNC) matrix using preprocessing-based spatial sampling (PPBSS). We demonstrate that the preprocessing matrix can be replaced b y the sample covariance matrix (SCM) in the shrinkage method. A power spectrum sampling strategy is then devised based on a preprocessing matrix computed with the estimated angular sectors' information. Moreover, the covariance matrix for the signal is formed for the angular sector of the signal-of-int erest (SOI), which allows for calculating an SV for the SOI using the power method. An analysis of the array beampattern in the proposed PPBSS technique is carried out, and a study of the computational cost of competing approaches is conducte d. Simulation results show the proposed method's effectivene ss compared to existing approaches. DAPTIVE beamforming spans across various fields, including wireless communications, radar, sonar, and medical imaging, where it significantly improves performan ce by increasing signal-to-noise ratio (SNR) and mitigating i n-terference [1].


An Encoder-Decoder Network for Beamforming over Sparse Large-Scale MIMO Channels

Zhang, Yubo, Johnston, Jeremy, Wang, Xiaodong

arXiv.org Artificial Intelligence

Abstract-- We develop an end-to-end deep learning framework for downlink beamforming in large-scale sparse multiple-input multiple-output (MIMO) channels. The core is a deep encoder-decoder network (EDN) architecture with three modules: (i) an encoder neural network (NN), deployed at each user end, that compresses estimated downlink channels into low-dimensional latent vectors. The latent vector from each user is compressed and then fed back to the BS. The training of EDN leverages two key strategies: (a) semi-amortized learning, where the beamformer decoder NN contains an analytical gradient ascent during both training and inference stages, and (b) knowledge distillation, where the loss function consists of a supervised term and an unsupervised term, and starting from supervised training with MMSE beamformers, over the epochs, the model training gradually shifts toward unsupervised using the sum-rate objective. The proposed EDN beamforming framework is extended to both far-field and near-field hybrid beamforming scenarios.


Speaker Embeddings to Improve Tracking of Intermittent and Moving Speakers

Iatariene, Taous, Cui, Can, Guérin, Alexandre, Serizel, Romain

arXiv.org Artificial Intelligence

Speaker tracking methods often rely on spatial observations to assign coherent track identities over time. This raises limits in scenarios with intermittent and moving speakers, i.e., speakers that may change position when they are inactive, thus leading to discontinuous spatial trajectories. This paper proposes to investigate the use of speaker embeddings, in a simple solution to this issue. We propose to perform identity reassignment post-tracking, using speaker embeddings. We leverage trajectory-related information provided by an initial tracking step and multichannel audio signal. Beamforming is used to enhance the signal towards the speakers' positions in order to compute speaker embeddings. These are then used to assign new track identities based on an enrollment pool. We evaluate the performance of the proposed speaker embedding-based identity reassignment method on a dataset where speakers change position during inactivity periods. Results show that it consistently improves the identity assignment performance of neural and standard tracking systems. In particular, we study the impact of beamforming and input duration for embedding extraction.


A Low-complexity Structured Neural Network Approach to Intelligently Realize Wideband Multi-beam Beamformers

Aluvihare, Hansaka, Sivasankar, Sivakumar, Li, Xianqi, Madanayake, Arjuna, Perera, Sirani M.

arXiv.org Artificial Intelligence

--True-time-delay (TTD) beamformers can produce wideband, squint-free beams in both analog and digital signal domains, unlike frequency-dependent FFT beams. Our previous work showed that TTD beamformers can be efficiently realized using the elements of delay V andermonde matrix (DVM), answering the longstanding beam-squint problem. Thus, building on our work on classical algorithms based on DVM, we propose neural network (NN) architecture to realize wideband multi-beam beamformers using structure-imposed weight matrices and submatrices. The structure and sparsity of the weight matrices and submatrices are shown to reduce the space and computational complexities of the NN greatly. L) complexity, where M is the number of nodes in each layer of the network, p is the number of submatrices per layer, and M >> p . We will show numerical simulations in the 24 GHz to 32 GHz range to demonstrate the numerical feasibility of realizing wideband multi-beam beamformers using the proposed neural architecture. We also show the complexity reduction of the proposed NN and compare that with fully connected NNs, to show the efficiency of the proposed architecture without sacrificing accuracy. The accuracy of the proposed NN architecture was shown using the mean squared error, which is based on an objective function of the weight matrices and beamformed signals of antenna arrays, while also normalizing nodes. The proposed NN architecture shows a low-complexity NN realizing wideband multi-beam beamformers in real-time for low-complexity intelligent systems. H. Aluvihare is with the Department of Mathematics, Embry-Riddle Aeronautical University, Daytona Beach, FL, 32703 USA email:aluvihah@my.erau.edu S. Sivasankar is with the Department of Electrical and Computer Engineering, Florida International University, Miami, FL, 33174 USA email:ssiva011@fiu.edu X. Li is with the Department of Mathematics & Systems Engineering, Florida Institute of Technology, Melbourne, FL 32901, USA e-mail: xli@fit.edu


End-to-End Multi-Microphone Speaker Extraction Using Relative Transfer Functions

Eisenberg, Aviad, Gannot, Sharon, Chazan, Shlomo E.

arXiv.org Artificial Intelligence

This paper introduces a multi-microphone method for extracting a desired speaker from a mixture involving multiple speakers and directional noise in a reverberant environment. In this work, we propose leveraging the instantaneous relative transfer function (RTF), estimated from a reference utterance recorded in the same position as the desired source. The effectiveness of the RTF-based spatial cue is compared with direction of arrival (DOA)-based spatial cue and the conventional spectral embedding. Experimental results in challenging acoustic scenarios demonstrate that using spatial cues yields better performance than the spectral-based cue and that the instantaneous RTF outperforms the DOA-based spatial cue.


Graph Neural Network Based Hybrid Beamforming Design in Wideband Terahertz MIMO-OFDM Systems

Li, Beier, Vu, Mai

arXiv.org Artificial Intelligence

6G wireless technology is projected to adopt higher and wider frequency bands, enabled by highly directional beamforming. However, the vast bandwidths available also make the impact of beam squint in massive multiple input and multiple output (MIMO) systems non-negligible. Traditional approaches such as adding a true-time-delay line (TTD) on each antenna are costly due to the massive antenna arrays required. This paper puts forth a signal processing alternative, specifically adapted to the multicarrier structure of OFDM systems, through an innovative application of Graph Neural Networks (GNNs) to optimize hybrid beamforming. By integrating two types of graph nodes to represent the analog and the digital beamforming matrices efficiently, our approach not only reduces the computational and memory burdens but also achieves high spectral efficiency performance, approaching that of all digital beamforming. The GNN runtime and memory requirement are at a fraction of the processing time and resource consumption of traditional signal processing methods, hence enabling real-time adaptation of hybrid beamforming. Furthermore, the proposed GNN exhibits strong resiliency to beam squinting, achieving almost constant spectral efficiency even as the system bandwidth increases at higher carrier frequencies.


Switchable deep beamformer for high-quality and real-time passive acoustic mapping

Zeng, Yi, Li, Jinwei, Zhu, Hui, Lu, Shukuan, Li, Jianfeng, Cai, Xiran

arXiv.org Artificial Intelligence

Passive acoustic mapping (PAM) is a promising tool for monitoring acoustic cavitation activities in the applications of ultrasound therapy. Data-adaptive beamformers for PAM have better image quality compared to the time exposure acoustics (TEA) algorithms. However, the computational cost of data-adaptive beamformers is considerably expensive. In this work, we develop a deep beamformer based on a generative adversarial network, which can switch between different transducer arrays and reconstruct high-quality PAM images directly from radio frequency ultrasound signals with low computational cost. The deep beamformer was trained on the dataset consisting of simulated and experimental cavitation signals of single and multiple microbubble clouds measured by different (linear and phased) arrays covering 1-15 MHz. We compared the performance of the deep beamformer to TEA and three different data-adaptive beamformers using the simulated and experimental test dataset. Compared with TEA, the deep beamformer reduced the energy spread area by 18.9%-65.0% and improved the image signal-to-noise ratio by 9.3-22.9 dB in average for the different arrays in our data. Compared to the data-adaptive beamformers, the deep beamformer reduced the computational cost by three orders of magnitude achieving 10.5 ms image reconstruction speed in our data, while the image quality was as good as that of the data-adaptive beamformers. These results demonstrated the potential of the deep beamformer for high-resolution monitoring of microbubble cavitation activities for ultrasound therapy.