Goto

Collaborating Authors

 Balazs, Peter


Construction of generalized samplets in Banach spaces

arXiv.org Artificial Intelligence

Recently, samplets have been introduced as localized discrete signed measures which are tailored to an underlying data set. Samplets exhibit vanishing moments, i.e., their measure integrals vanish for all polynomials up to a certain degree, which allows for feature detection and data compression. In the present article, we extend the different construction steps of samplets to functionals in Banach spaces more general than point evaluations. To obtain stable representations, we assume that these functionals form frames with square-summable coefficients or even Riesz bases with square-summable coefficients. In either case, the corresponding analysis operator is injective and we obtain samplet bases with the desired properties by means of constructing an isometry of the analysis operator's image. Making the assumption that the dual of the Banach space under consideration is imbedded into the space of compactly supported distributions, the multilevel hierarchy for the generalized samplet construction is obtained by spectral clustering of a similarity graph for the functionals' supports. Based on this multilevel hierarchy, generalized samplets exhibit vanishing moments with respect to a given set of primitives within the Banach space. We derive an abstract localization result for the generalized samplet coefficients with respect to the samplets' support sizes and the approximability of the Banach space elements by the chosen primitives. Finally, we present three examples showcasing the generalized samplet framework.


(Almost) Smooth Sailing: Towards Numerical Stability of Neural Networks Through Differentiable Regularization of the Condition Number

arXiv.org Machine Learning

Maintaining numerical stability in machine learning models is crucial for their reliability and performance. One approach to maintain stability of a network layer is to integrate the condition number of the weight matrix as a regularizing term into the optimization algorithm. However, due to its discontinuous nature and lack of differentiability the condition number is not suitable for a gradient descent approach. This paper introduces a novel regularizer that is provably differentiable almost everywhere and promotes matrices with low condition numbers. In particular, we derive a formula for the gradient of this regularizer which can be easily implemented and integrated into existing optimization algorithms. We show the advantages of this approach for noisy classification and denoising of MNIST images.


Instabilities in Convnets for Raw Audio

arXiv.org Artificial Intelligence

What makes waveform-based deep learning so hard? Despite numerous attempts at training convolutional neural networks (convnets) for filterbank design, they often fail to outperform hand-crafted baselines. These baselines are linear time-invariant systems: as such, they can be approximated by convnets with wide receptive fields. Yet, in practice, gradient-based optimization leads to suboptimal approximations. In our article, we approach this phenomenon from the perspective of initialization. We present a theory of large deviations for the energy response of FIR filterbanks with random Gaussian weights. We find that deviations worsen for large filters and locally periodic input signals, which are both typical for audio signal processing applications. Numerical simulations align with our theory and suggest that the condition number of a convolutional layer follows a logarithmic scaling law between the number and length of the filters, which is reminiscent of discrete wavelet bases.


Fitting Auditory Filterbanks with Multiresolution Neural Networks

arXiv.org Artificial Intelligence

Waveform-based deep learning faces a dilemma between nonparametric and parametric approaches. On one hand, convolutional neural networks (convnets) may approximate any linear time-invariant system; yet, in practice, their frequency responses become more irregular as their receptive fields grow. On the other hand, a parametric model such as LEAF is guaranteed to yield Gabor filters, hence an optimal time-frequency localization; yet, this strong inductive bias comes at the detriment of representational capacity. In this paper, we aim to overcome this dilemma by introducing a neural audio model, named multiresolution neural network (MuReNN). The key idea behind MuReNN is to train separate convolutional operators over the octave subbands of a discrete wavelet transform (DWT). Since the scale of DWT atoms grows exponentially between octaves, the receptive fields of the subsequent learnable convolutions in MuReNN are dilated accordingly. For a given real-world dataset, we fit the magnitude response of MuReNN to that of a well-established auditory filterbank: Gammatone for speech, CQT for music, and third-octave for urban sounds, respectively. This is a form of knowledge distillation (KD), in which the filterbank ''teacher'' is engineered by domain knowledge while the neural network ''student'' is optimized from data. We compare MuReNN to the state of the art in terms of goodness of fit after KD on a hold-out set and in terms of Heisenberg time-frequency localization. Compared to convnets and Gabor convolutions, we find that MuReNN reaches state-of-the-art performance on all three optimization problems.


Similarity graphs for the concealment of long duration data loss in music

arXiv.org Artificial Intelligence

We present a novel method for the compensation of long duration data gaps in audio signals, in particular music. The concealment of such signal defects is based on a graph that encodes signal structure in terms of time-persistent spectral similarity. A suitable candidate segment for the substitution of the lost content is proposed by an intuitive optimization scheme and smoothly inserted into the gap. Extensive listening tests show that the proposed algorithm provides highly promising results when applied to a variety of real-world music signals.