AITopics | filterbank

Collaborating Authors

filterbank

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unified Learnable 2D Convolutional Feature Extraction for ASR

Vieting, Peter, Hilmes, Benedikt, Schlüter, Ralf, Ney, Hermann

arXiv.org Artificial IntelligenceSep-15-2025

Neural front-ends represent a promising approach to feature extraction for automatic speech recognition (ASR) systems as they enable to learn specifically tailored features for different tasks. Yet, many of the existing techniques remain heavily influenced by classical methods. While this inductive bias may ease the system design, our work aims to develop a more generic front-end for feature extraction. Furthermore, we seek to unify the front-end architecture contrasting with existing approaches that apply a composition of several layer topologies originating from different sources. The experiments systematically show how to reduce the influence of existing techniques to achieve a generic front-end. The resulting 2D convolutional front-end is parameter-efficient and suitable for a scenario with limited computational resources unlike large models pre-trained on unlabeled audio. The results demonstrate that this generic unified approach is not only feasible but also matches the performance of existing supervised learnable feature extractors.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.10031

Country:

North America > United States (0.47)
Europe > Austria (0.46)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining > Feature Extraction (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.56)

Add feedback

Aliasing in Convnets: A Frame-Theoretic Perspective

Haider, Daniel, Lostanlen, Vincent, Ehler, Martin, Holighaus, Nicki, Balazs, Peter

arXiv.org Artificial IntelligenceJul-9-2025

Using a stride in a convolutional layer inherently introduces aliasing, which has implications for numerical stability and statistical generalization. While techniques such as the parametrizations via paraunitary systems have been used to promote orthogonal convolution and thus ensure Parseval stability, a general analysis of aliasing and its effects on the stability has not been done in this context. In this article, we adapt a frame-theoretic approach to describe aliasing in convolutional layers with 1D kernels, leading to practical estimates for stability bounds and characterizations of Parseval stability, that are tailored to take short kernel sizes into account. From this, we derive two computationally very efficient optimization objectives that promote Parseval stability via systematically suppressing aliasing. Finally, for layers with random kernels, we derive closed-form expressions for the expected value and variance of the terms that describe the aliasing effects, revealing fundamental insights into the aliasing behavior at initialization.

artificial intelligence, filterbank, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2507.06152

Country: Europe (0.68)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Acoustic Classification of Maritime Vessels using Learnable Filterbanks

Elsborg, Jonas, Vegge, Tejs, Bhowmik, Arghya

arXiv.org Artificial IntelligenceJun-2-2025

Reliably monitoring and recognizing maritime vessels based on acoustic signatures is complicated by the variability of different recording scenarios. A robust classification framework must be able to generalize across diverse acoustic environments and variable source-sensor distances. To this end, we present a deep learning model with robust performance across different recording scenarios. Using a trainable spectral front-end and temporal feature encoder to learn a Gabor filterbank, the model can dynamically emphasize different frequency components. Trained on the VTUAD hydrophone recordings from the Strait of Georgia, our model, CATFISH, achieves a state-of-the-art 96.63 % percent test accuracy across varying source-sensor distances, surpassing the previous benchmark by over 12 percentage points. We present the model, justify our architectural choices, analyze the learned Gabor filters, and perform ablation studies on sensor data fusion and attention-based pooling.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.23964

Country: Europe > Denmark (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

FLEXtime: Filterbank learning for explaining time series

Brüsch, Thea, Wickstrøm, Kristoffer K., Schmidt, Mikkel N., Jenssen, Robert, Alstrøm, Tommy S.

arXiv.org Artificial IntelligenceNov-6-2024

State-of-the-art methods for explaining predictions based on time series are built on learning an instance-wise saliency mask for each time step. However, for many types of time series, the salient information is found in the frequency domain. Adopting existing methods to the frequency domain involves naively zeroing out frequency content in the signals, which goes against established signal processing theory. Therefore, we propose a new method entitled FLEXtime, that uses a filterbank to split the time series into frequency bands and learns the optimal combinations of these bands. FLEXtime avoids the drawbacks of zeroing out frequency bins and is more stable and easier to train compared to the naive method. Our extensive evaluation shows that FLEXtime on average outperforms state-of-the-art explainability methods across a range of datasets. FLEXtime fills an important gap in the time series explainability literature and can provide a valuable tool for a wide range of time series like EEG and audio.

explanation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.05841

Country:

Europe > Norway (0.04)
North America > United States (0.04)
Europe > Italy > Marche > Ancona Province > Ancona (0.04)
Asia > Thailand (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reviews: Learning filter widths of spectral decompositions with wavelets

Neural Information Processing SystemsOct-7-2024, 05:58:21 GMT

In the context of time series classification, this paper introduces the Wavelet Deconvolution layer, which is able to learn the filters width of a neural network-based classifier, which is usually tuned by hand through hyper-parameters grid search. Experimental studies on time series are presented, including a pilot study using artificial data, a phone recognition study on TIMIT and a time series classification study on the Haptic UCR dataset. It is shown that the proposed method leads to similar of better results than state-of-the-art systems. Quality: The paper is correct to my understanding. The experiments are convincing and clearly show the benefit of the proposed approach.

annual conference, international speech communication association, spectral decomposition, (7 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.06)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Revisiting Acoustic Features for Robust ASR

Shah, Muhammad A., Raj, Bhiksha

arXiv.org Artificial IntelligenceSep-24-2024

Automatic Speech Recognition (ASR) systems must be robust to the myriad types of noises present in real-world environments including environmental noise, room impulse response, special effects as well as attacks by malicious actors (adversarial attacks). Recent works seek to improve accuracy and robustness by developing novel Deep Neural Networks (DNNs) and curating diverse training datasets for them, while using relatively simple acoustic features. While this approach improves robustness to the types of noise present in the training data, it confers limited robustness against unseen noises and negligible robustness to adversarial attacks. In this paper, we revisit the approach of earlier works that developed acoustic features inspired by biological auditory perception that could be used to perform accurate and robust ASR. In contrast, Specifically, we evaluate the ASR accuracy and robustness of several biologically inspired acoustic features. In addition to several features from prior works, such as gammatone filterbank features (GammSpec), we also propose two new acoustic features called frequency masked spectrogram (FreqMask) and difference of gammatones spectrogram (DoGSpec) to simulate the neuro-psychological phenomena of frequency masking and lateral suppression. Experiments on diverse models and datasets show that (1) DoGSpec achieves significantly better robustness than the highly popular log mel spectrogram (LogMelSpec) with minimal accuracy degradation, and (2) GammSpec achieves better accuracy and robustness to non-adversarial noises from the Speech Robust Bench benchmark, but it is outperformed by DoGSpec against adversarial attacks.

frequency, robustness, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

2409.16399

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Germany > Saxony > Leipzig (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.76)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Machine Anomalous Sound Detection Using Spectral-temporal Modulation Representations Derived from Machine-specific Filterbanks

Li, Kai, Zaman, Khalid, Li, Xingfeng, Akagi, Masato, Unoki, Masashi

arXiv.org Artificial IntelligenceSep-9-2024

Early detection of factory machinery malfunctions is crucial in industrial applications. In machine anomalous sound detection (ASD), different machines exhibit unique vibration-frequency ranges based on their physical properties. Meanwhile, the human auditory system is adept at tracking both temporal and spectral dynamics of machine sounds. Consequently, integrating the computational auditory models of the human auditory system with machine-specific properties can be an effective approach to machine ASD. We first quantified the frequency importances of four types of machines using the Fisher ratio (F-ratio). The quantified frequency importances were then used to design machine-specific non-uniform filterbanks (NUFBs), which extract the log non-uniform spectrum (LNS) feature. The designed NUFBs have a narrower bandwidth and higher filter distribution density in frequency regions with relatively high F-ratios. Finally, spectral and temporal modulation representations derived from the LNS feature were proposed. These proposed LNS feature and modulation representations are input into an autoencoder neural-network-based detector for ASD. The quantification results from the training set of the Malfunctioning Industrial Machine Investigation and Inspection dataset with a signal-to-noise (SNR) of 6 dB reveal that the distinguishing information between normal and anomalous sounds of different machines is encoded non-uniformly in the frequency domain. By highlighting these important frequency regions using NUFBs, the LNS feature can significantly enhance performance using the metric of AUC (area under the receiver operating characteristic curve) under various SNR conditions. Furthermore, modulation representations can further improve performance. Specifically, temporal modulation is effective for fans, pumps, and sliders, while spectral modulation is particularly effective for valves.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2409.05319

Country:

North America > United States (0.04)
Asia > Macao (0.04)
Asia > Japan (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Parseval Convolution Operators and Neural Networks

Unser, Michael, Ducotterd, Stanislas

arXiv.org Machine LearningAug-19-2024

We first establish a kernel theorem that characterizes all linear shift-invariant (LSI) operators acting on discrete multicomponent signals. This result naturally leads to the identification of the Parseval convolution operators as the class of energy-preserving filterbanks. We then present a constructive approach for the design/specification of such filterbanks via the chaining of elementary Parseval modules, each of which being parameterized by an orthogonal matrix or a 1-tight frame. Our analysis is complemented with explicit formulas for the Lipschitz constant of all the components of a convolutional neural network (CNN), which gives us a handle on their stability. Finally, we demonstrate the usage of those tools with the design of a CNN-based algorithm for the iterative reconstruction of biomedical images. Our algorithm falls within the plug-and-play framework for the resolution of inverse problems. It yields better-quality results than the sparsity-based methods used in compressed sensing, while offering essentially the same convergence and robustness guarantees.

operator, parseval convolution operator, reconstruction, (14 more...)

arXiv.org Machine Learning

2408.09981

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

ConcateNet: Dialogue Separation Using Local And Global Feature Concatenation

Halimeh, Mhd Modar, Torcoli, Matteo, Habets, Emanuël

arXiv.org Artificial IntelligenceAug-16-2024

Dialogue separation involves isolating a dialogue signal from a mixture, such as a movie or a TV program. This can be a necessary step to enable dialogue enhancement for broadcast-related applications. In this paper, ConcateNet for dialogue separation is proposed, which is based on a novel approach for processing local and global features aimed at better generalization for out-of-domain signals. ConcateNet is trained using a noise reduction-focused, publicly available dataset and evaluated using three datasets: two noise reduction-focused datasets (in-domain), which show competitive performance for ConcateNet, and a broadcast-focused dataset (out-of-domain), which verifies the better generalization performance for the proposed architecture compared to considered state-of-the-art noise-reduction methods.

concatenet, dataset, module, (17 more...)

arXiv.org Artificial Intelligence

2408.08729

Country:

North America > United States > Rhode Island (0.04)
Europe > Greece (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
(7 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Media (0.48)

Technology:

Information Technology > Data Science > Data Mining (0.78)
Information Technology > Artificial Intelligence > Speech (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Toward end-to-end interpretable convolutional neural networks for waveform signals

Vu, Linh, Tran, Thu, Lim, Wern-Han, Phan, Raphael

arXiv.org Artificial IntelligenceMay-2-2024

This paper introduces a novel convolutional neural networks (CNN) framework tailored for end-to-end audio deep learning models, presenting advancements in efficiency and explainability. By benchmarking experiments on three standard speech emotion recognition datasets with five-fold cross-validation, our framework outperforms Mel spectrogram features by up to seven percent. It can potentially replace the Mel-Frequency Cepstral Coefficients (MFCC) while remaining lightweight. Furthermore, we demonstrate the efficiency and interpretability of the front-end layer using the PhysioNet Heart Sound Database, illustrating its ability to handle and capture intricate long waveform patterns. Our contributions offer a portable solution for building efficient and interpretable models for raw waveform data.

convolutional neural network, dataset, neural network, (16 more...)

arXiv.org Artificial Intelligence

2405.01815

Country: Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback