AITopics | stft

Collaborating Authors

stft

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Time-Frequency Filtering Meets Graph Clustering

Colominas, Marcelo A., Steinerberger, Stefan, Wu, Hau-Tieng

arXiv.org Artificial IntelligenceOct-10-2025

We show that the problem of identifying different signal components from a time-frequency representation can be equivalently phrased as a graph clustering problem: given a graph $G=(V,E)$ one aims to identify `clusters', subgraphs that are strongly connected and have relatively few connections between them. The graph clustering problem is well studied, we show how these ideas can suggest (many) new ways to identify signal components. Numerical experiments illustrate the ideas.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.07503

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Low-Rank Time-Frequency Synthesis

Neural Information Processing SystemsSep-30-2025, 08:52:29 GMT

Many single-channel signal decomposition techniques rely on a low-rank factorization of a time-frequency transform. In particular, nonnegative matrix factorization (NMF) of the spectrogram -- the (power) magnitude of the short-time Fourier transform (STFT) -- has been considered in many audio applications. In this setting, NMF with the Itakura-Saito divergence was shown to underly a generative Gaussian composite model (GCM) of the STFT, a step forward from more empirical approaches based on ad-hoc transform and divergence specifications. Still, the GCM is not yet a generative model of the raw signal itself, but only of its STFT. The work presented in this paper fills in this ultimate gap by proposing a novel signal synthesis model with low-rank time-frequency structure. In particular, our new approach opens doors to multi-resolution representations, that were not possible in the traditional NMF setting. We describe two expectation-maximization algorithms for estimation in the new model and report audio signal processing results with music decomposition and speech enhancement.

electronic proceedings, low-rank time-frequency synthesis, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TFOC-Net: A Short-time Fourier Transform-based Deep Learning Approach for Enhancing Cross-Subject Motor Imagery Classification

Habashi, Ahmed G., Azab, Ahmed M., Eldawlatly, Seif, Aly, Gamal M.

arXiv.org Artificial IntelligenceJul-4-2025

Cross-subject motor imagery (CS-MI) classification in brain-computer interfaces (BCIs) is a challenging task due to the significant variability in Electroencephalography (EEG) patterns across different individuals. This variability often results in lower classification accuracy compared to subject-specific models, presenting a major barrier to developing calibration-free BCIs suitable for real-world applications. In this paper, we introduce a novel approach that significantly enhances cross-subject MI classification performance through optimized preprocessing and deep learning techniques. Our approach involves direct classification of Short-Time Fourier Transform (STFT)-transformed EEG data, optimized STFT parameters, and a balanced batching strategy during training of a Convolutional Neural Network (CNN). This approach is uniquely validated across four different datasets, including three widely-used benchmark datasets leading to substantial improvements in cross-subject classification, achieving 67.60% on the BCI Competition IV Dataset 1 (IV-1), 65.96% on Dataset 2A (IV-2A), and 80.22% on Dataset 2B (IV-2B), outperforming state-of-the-art techniques. Additionally, we systematically investigate the classification performance using MI windows ranging from the full 4-second window to 1-second windows. These results establish a new benchmark for generalizable, calibration-free MI classification in addition to contributing a robust open-access dataset to advance research in this domain.

artificial intelligence, classification, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2507.0251

Country:

Europe > Austria > Styria > Graz (0.05)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
Europe > Switzerland (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Promising Solution (0.86)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform

Leiber, Maxime, Marnissi, Yosra, Barrau, Axel, Meignen, Sylvain, Massoulié, Laurent

arXiv.org Artificial IntelligenceJun-27-2025

The short-time Fourier transform (STFT) is widely used for analyzing non-stationary signals. However, its performance is highly sensitive to its parameters, and manual or heuristic tuning often yields suboptimal results. To overcome this limitation, we propose a unified differentiable formulation of the STFT that enables gradient-based optimization of its parameters. This approach addresses the limitations of traditional STFT parameter tuning methods, which often rely on computationally intensive discrete searches. It enables fine-tuning of the time-frequency representation (TFR) based on any desired criterion. Moreover, our approach integrates seamlessly with neural networks, allowing joint optimization of the STFT parameters and network weights. The efficacy of the proposed differentiable STFT in enhancing TFRs and improving performance in downstream tasks is demonstrated through experiments on both simulated and real-world data.

artificial intelligence, machine learning, window length, (18 more...)

arXiv.org Artificial Intelligence

2506.2144

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)

Add feedback

Low-Rank Time-Frequency Synthesis

Cédric Févotte, Matthieu Kowalski

Neural Information Processing SystemsFeb-8-2025, 22:48:57 GMT

Many single-channel signal decomposition techniques rely on a low-rank factorization of a time-frequency transform. In particular, nonnegative matrix factorization (NMF) of the spectrogram - the (power) magnitude of the short-time Fourier transform (STFT) - has been considered in many audio applications. In this setting, NMF with the Itakura-Saito divergence was shown to underly a generative Gaussian composite model (GCM) of the STFT, a step forward from more empirical approaches based on ad-hoc transform and divergence specifications. Still, the GCM is not yet a generative model of the raw signal itself, but only of its STFT. The work presented in this paper fills in this ultimate gap by proposing a novel signal synthesis model with low-rank time-frequency structure. In particular, our new approach opens doors to multi-resolution representations, that were not possible in the traditional NMF setting. We describe two expectation-maximization algorithms for estimation in the new model and report audio signal processing results with music decomposition and speech enhancement.

artificial intelligence, coefficient, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Add feedback

Improving snore detection under limited dataset through harmonic/percussive source separation and convolutional neural networks

Gonzalez-Martinez, F. D., Carabias-Orti, J. J., Canadas-Quesada, F. J., Ruiz-Reyes, N., Martinez-Munoz, D., Garcia-Galan, S.

arXiv.org Artificial IntelligenceOct-31-2024

Snoring, an acoustic biomarker commonly observed in individuals with Obstructive Sleep Apnoea Syndrome (OSAS), holds significant potential for diagnosing and monitoring this recognized clinical disorder. Irrespective of snoring types, most snoring instances exhibit identifiable harmonic patterns manifested through distinctive energy distributions over time. In this work, we propose a novel method to differentiate monaural snoring from non-snoring sounds by analyzing the harmonic content of the input sound using harmonic/percussive sound source separation (HPSS). The resulting feature, based on the harmonic spectrogram from HPSS, is employed as input data for conventional neural network architectures, aiming to enhance snoring detection performance even under a limited data learning framework. To evaluate the performance of our proposal, we studied two different scenarios: 1) using a large dataset of snoring and interfering sounds, and 2) using a reduced training set composed of around 1% of the data material. In the former scenario, the proposed HPSS-based feature provides competitive results compared to other input features from the literature. However, the key advantage of the proposed method lies in the superior performance of the harmonic spectrogram derived from HPSS in a limited data learning context. In this particular scenario, using the proposed harmonic feature significantly enhances the performance of all the studied architectures in comparison to the classical input features documented in the existing literature. This finding clearly demonstrates that incorporating harmonic content enables more reliable learning of the essential time-frequency characteristics that are prevalent in most snoring sounds, even in scenarios where the amount of training data is limited.

architecture, detection, spectrogram, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.apacoust.2023.109811

2410.23796

Country:

North America > Canada (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Spain > Andalusia (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Sleep (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement

Aldarmaki, Ibrahim, Solorio, Thamar, Raj, Bhiksha, Aldarmaki, Hanan

arXiv.org Artificial IntelligenceOct-7-2024

Neural multi-channel speech enhancement models, in particular those based on the U-Net architecture, demonstrate promising performance and generalization potential. These models typically encode input channels independently, and integrate the channels during later stages of the network. In this paper, we propose a novel modification of these models by incorporating relative information from the outset, where each channel is processed in conjunction with a reference channel through stacking. This input strategy exploits comparative differences to adaptively fuse information between channels, thereby capturing crucial spatial information and enhancing the overall performance. The experiments conducted on the CHiME-3 dataset demonstrate improvements in speech enhancement metrics across various architectures.

architecture, enhancement, speech enhancement, (13 more...)

arXiv.org Artificial Intelligence

2410.05019

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Investigation of Time-Frequency Feature Combinations with Histogram Layer Time Delay Neural Networks

Mohammadi, Amirmohammad, Masabarakiza, Iren'e, Barnes, Ethan, Carreiro, Davelle, Van Dine, Alexandra, Peeples, Joshua

arXiv.org Artificial IntelligenceSep-20-2024

While deep learning has reduced the prevalence of manual feature extraction, transformation of data via feature engineering remains essential for improving model performance, particularly for underwater acoustic signals. The methods by which audio signals are converted into time-frequency representations and the subsequent handling of these spectrograms can significantly impact performance. This work demonstrates the performance impact of using different combinations of time-frequency features in a histogram layer time delay neural network. An optimal set of features is identified with results indicating that specific feature combinations outperform single data features.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2409.13881

Country:

North America > United States > Texas > Brazos County > College Station (0.15)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Lexington (0.04)

Genre: Research Report (1.00)

Industry: Government > Regional Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Learning-based Machine Condition Diagnosis using Short-time Fourier Transformation Variants

Piedad, Eduardo Jr, Mayordo, Zherish Galvin, Prieto-Araujo, Eduardo, Gomis-Bellmunt, Oriol

arXiv.org Artificial IntelligenceAug-18-2024

In motor condition diagnosis, electrical current signature serves as an alternative feature to vibration-based sensor data, which is a more expensive and invasive method. Machine learning (ML) techniques have been emerging in diagnosing motor conditions using only motor phase current signals. This study converts time-series motor current signals to time-frequency 2D plots using Short-time Fourier Transform (STFT) methods. The motor current signal dataset consists of 3,750 sample points with five classes - one healthy and four synthetically-applied motor fault conditions, and with five loading conditions: 0, 25, 50, 75, and 100%. Five transformation methods are used on the dataset: non-overlap and overlap STFTs, non-overlap and overlap realigned STFTs, and synchrosqueezed STFT. Then, deep learning (DL) models based on the previous Convolutional Neural Network (CNN) architecture are trained and validated from generated plots of each method. The DL models of overlap-STFT, overlap R-STFT, non-overlap STFT, non-overlap R-STFT, and synchrosqueezed-STFT performed exceptionally with an average accuracy of 97.65, 96.03, 96.08, 96.32, and 88.27%, respectively. Four methods outperformed the previous best ML method with 93.20% accuracy, while all five outperformed previous 2D-plot-based methods with accuracy of 80.25, 74.80, and 82.80%, respectively, using the same dataset, same DL architecture, and validation steps.

diagnosis, fault diagnosis, stft, (13 more...)

arXiv.org Artificial Intelligence

2408.09649

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
Asia > Philippines > Luzon > National Capital Region > City of Quezon (0.05)
North America > United States (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders

Cao, Yubing, Li, Yongming, Wang, Liejun, Yu, Yinfeng

arXiv.org Artificial IntelligenceAug-13-2024

Since the introduction of Generative Adversarial Networks (GANs) in speech synthesis, remarkable achievements have been attained. In a thorough exploration of vocoders, it has been discovered that audio waveforms can be generated at speeds exceeding real-time while maintaining high fidelity, achieved through the utilization of GAN-based models. Typically, the inputs to the vocoder consist of band-limited spectral information, which inevitably sacrifices high-frequency details. To address this, we adopt the full-band Mel spectrogram information as input, aiming to provide the vocoder with the most comprehensive information possible. However, previous studies have revealed that the use of full-band spectral information as input can result in the issue of over-smoothing, compromising the naturalness of the synthesized speech. To tackle this challenge, we propose VNet, a GAN-based neural vocoder network that incorporates full-band spectral information and introduces a Multi-Tier Discriminator (MTD) comprising multiple sub-discriminators to generate high-resolution signals. Additionally, we introduce an asymptotically constrained method that modifies the adversarial loss of the generator and discriminator, enhancing the stability of the training process. Through rigorous experiments, we demonstrate that the VNet model is capable of generating high-fidelity speech and significantly improving the performance of the vocoder.

discriminator, international conference, spectrogram, (14 more...)

arXiv.org Artificial Intelligence

2408.06906

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China > Xinjiang Uygur Autonomous Region (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.74)

Add feedback