clean signal
Real-Time Speech Enhancement via a Hybrid ViT: A Dual-Input Acoustic-Image Feature Fusion
Bahmei, Behnaz, Arzanpour, Siamak, Birmingham, Elina
Speech quality and intelligibility are significantly degraded in noisy environments. This paper presents a novel transformer-based learning framework to address the single-channel noise suppression problem for real-time applications. Although existing deep learning networks have shown remarkable improvements in handling stationary noise, their performance often diminishes in real-world environments characterized by non-stationary noise (e.g., dog barking, baby crying). The proposed dual-input acoustic-image feature fusion using a hybrid ViT framework effectively models both temporal and spectral dependencies in noisy signals. Designed for real-world audio environments, the proposed framework is computationally lightweight and suitable for implementation on embedded devices. T o evaluate its effectiveness, four standard and commonly used quality measurements, namely PESQ, STOI, Seg SNR, and LLR, are utilized. Experimental results obtained using the Librispeech dataset as the clean speech source and the Ur-banSound8K and Google Audioset datasets as the noise sources, demonstrate that the proposed method significantly improves noise reduction, speech intelligibility, and perceptual quality compared to the noisy input signal, achieving performance close to the clean reference.
Synthetic Time Series Forecasting with Transformer Architectures: Extensive Simulation Benchmarks
Forootani, Ali, Khosravi, Mohammad
Time series forecasting plays a critical role in domains such as energy, finance, and healthcare, where accurate predictions inform decision-making under uncertainty. Although Transformer-based models have demonstrated success in sequential modeling, their adoption for time series remains limited by challenges such as noise sensitivity, long-range dependencies, and a lack of inductive bias for temporal structure. In this work, we present a unified and principled framework for benchmarking three prominent Transformer forecasting architectures-Autoformer, Informer, and Patchtst-each evaluated through three architectural variants: Minimal, Standard, and Full, representing increasing levels of complexity and modeling capacity. We conduct over 1500 controlled experiments on a suite of ten synthetic signals, spanning five patch lengths and five forecast horizons under both clean and noisy conditions. Our analysis reveals consistent patterns across model families. To advance this landscape further, we introduce the Koopman-enhanced Transformer framework, Deep Koopformer, which integrates operator-theoretic latent state modeling to improve stability and interpretability. We demonstrate its efficacy on nonlinear and chaotic dynamical systems. Our results highlight Koopman based Transformer as a promising hybrid approach for robust, interpretable, and theoretically grounded time series forecasting in noisy and complex real-world conditions.
A Solid-State Nanopore Signal Generator for Training Machine Learning Models
Johnson, Jaise, Galigekere, Chinmayi R, Varma, Manoj M
Translocation event detection from raw nanopore current signals is a fundamental step in nanopore signal analysis. Traditional data analysis methods rely on user-defined parameters to extract event information, making the interpretation of experimental results sensitive to parameter choice. While Machine Learning (ML) has seen widespread adoption across various scientific fields, its potential remains underexplored in solid-state nanopore research. In this work, we introduce a nanopore signal generator capable of producing extensive synthetic datasets for machine learning applications and benchmarking nanopore signal analysis platforms. Using this generator, we train deep learning models to detect translocation events directly from raw signals, achieving over 99% true event detection with minimal false positives.
Subspace Defense: Discarding Adversarial Perturbations by Learning a Subspace for Clean Signals
Zheng, Rui, Zhou, Yuhao, Xi, Zhiheng, Gui, Tao, Zhang, Qi, Huang, Xuanjing
Deep neural networks (DNNs) are notoriously vulnerable to adversarial attacks that place carefully crafted perturbations on normal examples to fool DNNs. To better understand such attacks, a characterization of the features carried by adversarial examples is needed. In this paper, we tackle this challenge by inspecting the subspaces of sample features through spectral analysis. We first empirically show that the features of either clean signals or adversarial perturbations are redundant and span in low-dimensional linear subspaces respectively with minimal overlap, and the classical low-dimensional subspace projection can suppress perturbation features out of the subspace of clean signals. This makes it possible for DNNs to learn a subspace where only features of clean signals exist while those of perturbations are discarded, which can facilitate the distinction of adversarial examples. To prevent the residual perturbations that is inevitable in subspace learning, we propose an independence criterion to disentangle clean signals from perturbations. Experimental results show that the proposed strategy enables the model to inherently suppress adversaries, which not only boosts model robustness but also motivates new directions of effective adversarial defense.
A Self-Supervised Algorithm for Denoising Photoplethysmography Signals for Heart Rate Estimation from Wearables
Jain, Pranay, Ding, Cheng, Rudin, Cynthia, Hu, Xiao
Smart watches and other wearable devices are equipped with photoplethysmography (PPG) sensors for monitoring heart rate and other aspects of cardiovascular health. However, PPG signals collected from such devices are susceptible to corruption from noise and motion artifacts, which cause errors in heart rate estimation. Typical denoising approaches filter or reconstruct the signal in ways that eliminate much of the morphological information, even from the clean parts of the signal that would be useful to preserve. In this work, we develop an algorithm for denoising PPG signals that reconstructs the corrupted parts of the signal, while preserving the clean parts of the PPG signal. Our novel framework relies on self-supervised training, where we leverage a large database of clean PPG signals to train a denoising autoencoder. As we show, our reconstructed signals provide better estimates of heart rate from PPG signals than the leading heart rate estimation methods. Further experiments show significant improvement in Heart Rate Variability (HRV) estimation from PPG signals using our algorithm. We conclude that our algorithm denoises PPG signals in a way that can improve downstream analysis of many different health metrics from wearable devices.
Jammer classification with Federated Learning
Wu, Peng, Calatrava, Helena, Imbiriba, Tales, Closas, Pau
Jamming signals can jeopardize the operation of GNSS receivers until denying its operation. Given their ubiquity, jamming mitigation and localization techniques are of crucial importance, for which jammer classification is of help. Data-driven models have been proven useful in detecting these threats, while their training using crowdsourced data still poses challenges when it comes to private data sharing. This article investigates the use of federated learning to train jamming signal classifiers locally on each device, with model updates aggregated and averaged at the central server. This allows for privacy-preserving training procedures that do not require centralized data storage or access to client local data. The used framework FedAvg is assessed on a dataset consisting of spectrogram images of simulated interfered GNSS signal. Six different jammer types are effectively classified with comparable results to a fully centralized solution that requires vast amounts of data communication and involves privacy-preserving concerns.
Restore Seriously Degraded Human Speech using AI
"Voicefixer aims to restore human speech regardless of how serious it's degraded. It can handle noise, reveberation, low resolution (2kHz 44.1kHz) and clipping (0.1โ1.0 threshold) effect within one model." This is exactly what is mentioned in restoring the Speech using VoiceFixer. We will be using a wonderful Github Repo "VoiceFixer" to clean our audio input files.
Audio Denoising with Deep Network Priors
Michelashvili, Michael, Wolf, Lior
We present a method for audio denoising that combines processing done in both the time domain and the time-frequency domain. Given a noisy audio clip, the method trains a deep neural network to fit this signal. Since the fitting is only partly successful and is able to better capture the underlying clean signal than the noise, the output of the network helps to disentangle the clean audio from the rest of the signal. The method is completely unsupervised and only trains on the specific audio clip that is being denoised. Our experiments demonstrate favorable performance in comparison to the literature methods, and our code and audio samples are available at https: //github.com/mosheman5/DNP. Index Terms: Audio denoising; Unsupervised learning
Monaural source enhancement maximizing source-to-distortion ratio via automatic differentiation
Nakajima, Hiroaki, Takahashi, Yu, Kondo, Kazunobu, Hisaminato, Yuji
Recently, deep neural network (DNN) has made a breakthrough in monaural source enhancement. Through a training step by using a large amount of data, DNN estimates a mapping between mixed signals and clean signals. At this time, we use an objective function that numerically expresses the quality of a mapping by DNN. In the conventional methods, L1 norm, L2 norm, and Itakura-Saito divergence are often used as objective functions. Recently, an objective function based on short-time objective intelligibility (STOI) has also been proposed. However, these functions only indicate similarity between the clean signal and the estimated signal by DNN. In other words, they do not show the quality of noise reduction or source enhancement. Motivated by the fact, this paper adopts signal-to-distortion ratio (SDR) as the objective function. Since SDR virtually shows signal-to-noise ratio (SNR), maximizing SDR solves the above problem. The experimental results revealed that the proposed method achieved better performance than the conventional methods.