mallat
Bayesian Scattering: A Principled Baseline for Uncertainty on Image Data
Fichera, Bernardo, Ivkovic, Zarko, Jorner, Kjell, Hennig, Philipp, Borovitskiy, Viacheslav
Uncertainty quantification for image data is dominated by complex deep learning methods, yet the field lacks an interpretable, mathematically grounded baseline. We propose Bayesian scattering to fill this gap, serving as a first-step baseline akin to the role of Bayesian linear regression for tabular data. Our method couples the wavelet scattering transform-a deep, non-learned feature extractor-with a simple probabilistic head. Because scattering features are derived from geometric principles rather than learned, they avoid overfitting the training distribution. This helps provide sensible uncertainty estimates even under significant distribution shifts. We validate this on diverse tasks, including medical imaging under institution shift, wealth mapping under country-to-country shift, and Bayesian optimization of molecular properties. Our results suggest that Bayesian scattering is a solid baseline for complex uncertainty quantification methods.
The Maximal Overlap Discrete Wavelet Scattering Transform and Its Application in Classification Tasks
Larrubia, Leonardo Fonseca, Morettin, Pedro Alberto, Chiann, Chang
We present the Maximal Overlap Discrete Wavelet Scattering Transform (MODWST), whose construction is inspired by the combination of the Maximal Overlap Discrete Wavelet Transform (MODWT) and the Scattering Wavelet Transform (WST). We also discuss the use of MODWST in classification tasks, evaluating its performance in two applications: stationary signal classification and ECG signal classification. The results demonstrate that MODWST achieved good performance in both applications, positioning itself as a viable alternative to popular methods like Convolutional Neural Networks (CNNs), particularly when the training data set is limited.
WhaleNet: a Novel Deep Learning Architecture for Marine Mammals Vocalizations on Watkins Marine Mammal Sound Database
Licciardi, Alessandro, Carbone, Davide
Marine mammal communication is a complex field, hindered by the diversity of vocalizations and environmental factors. The Watkins Marine Mammal Sound Database (WMMD) constitutes a comprehensive labeled dataset employed in machine learning applications. Nevertheless, the methodologies for data preparation, preprocessing, and classification documented in the literature exhibit considerable variability and are typically not applied to the dataset in its entirety. This study initially undertakes a concise review of the state-of-the-art benchmarks pertaining to the dataset, with a particular focus on clarifying data preparation and preprocessing techniques. Subsequently, we explore the utilization of the Wavelet Scattering Transform (WST) and Mel spectrogram as preprocessing mechanisms for feature extraction. In this paper, we introduce \textbf{WhaleNet} (Wavelet Highly Adaptive Learning Ensemble Network), a sophisticated deep ensemble architecture for the classification of marine mammal vocalizations, leveraging both WST and Mel spectrogram for enhanced feature discrimination. By integrating the insights derived from WST and Mel representations, we achieved an improvement in classification accuracy by $8-10\%$ over existing architectures, corresponding to a classification accuracy of $97.61\%$.
Mean-Field Microcanonical Gradient Descent
Häggbom, Marcus, Karlsmark, Morten, Andén, Joakim
Microcanonical gradient descent is a sampling procedure for energy-based models allowing for efficient sampling of distributions in high dimension. It works by transporting samples from a high-entropy distribution, such as Gaussian white noise, to a low-energy region using gradient descent. We put this model in the framework of normalizing flows, showing how it can often overfit by losing an unnecessary amount of entropy in the descent. As a remedy, we propose a mean-field microcanonical gradient descent that samples several weakly coupled data points simultaneously, allowing for better control of the entropy loss while paying little in terms of likelihood fit. We study these models in the context of financial time series, illustrating the improvements on both synthetic and real data.
Wavelet Score-Based Generative Modeling
Guth, Florentin, Coste, Simon, De Bortoli, Valentin, Mallat, Stephane
Score-based generative models (SGMs) synthesize new data samples from Gaussian white noise by running a time-reversed Stochastic Differential Equation (SDE) whose drift coefficient depends on some probabilistic score. The discretization of such SDEs typically requires a large number of time steps and hence a high computational cost. This is because of ill-conditioning properties of the score that we analyze mathematically. We show that SGMs can be considerably accelerated, by factorizing the data distribution into a product of conditional probabilities of wavelet coefficients across scales. The resulting Wavelet Score-based Generative Model (WSGM) synthesizes wavelet coefficients with the same number of time steps at all scales, and its time complexity therefore grows linearly with the image size. This is proved mathematically over Gaussian distributions, and shown numerically over physical processes at phase transition and natural image datasets.
Phase Collapse in Neural Networks
Guth, Florentin, Zarka, John, Mallat, Stéphane
Deep convolutional image classifiers progressively transform the spatial variability into a smaller number of channels, which linearly separates all classes. A fundamental challenge is to understand the role of rectifiers together with convolutional filters in this transformation. Rectifiers with biases are often interpreted as thresholding operators which improve sparsity and discrimination. This paper demonstrates that it is a different phase collapse mechanism which explains the ability to progressively eliminate spatial variability, while improving linear class separation. This is explained and shown numerically by defining a simplified complex-valued convolutional network architecture. It implements spatial convolutions with wavelet filters and uses a complex modulus to collapse phase variables. This phase collapse network reaches the classification accuracy of ResNets of similar depths, whereas its performance is considerably degraded when replacing the phase collapse with thresholding operators. This is justified by explaining how iterated phase collapses progressively improve separation of class means, as opposed to thresholding non-linearities.
Sparse Multi-Family Deep Scattering Network
Cosentino, Romain, Balestriero, Randall
In this work, we propose the Sparse Multi-Family Deep Scattering Network (SMF-DSN), a novel architecture exploiting the interpretability of the Deep Scattering Network (DSN) and improving its expressive power. The DSN extracts salient and interpretable features in signals by cascading wavelet transforms, complex modulus and extract the representation of the data via a translation-invariant operator. First, leveraging the development of highly specialized wavelet filters over the last decades, we propose a multi-family approach to DSN. In particular, we propose to cross multiple wavelet transforms at each layer of the network, thus increasing the feature diversity and removing the need for an expert to select the appropriate filter. Secondly, we develop an optimal thresholding strategy adequate for the DSN that regularizes the network and controls possible instabilities induced by the signals, such as non-stationary noise. Our systematic and principled solution sparsifies the network's latent representation by acting as a local mask distinguishing between activity and noise. The SMF-DSN enhances the DSN by (i) increasing the diversity of the scattering coefficients and (ii) improves its robustness with respect to non-stationary noise.
Interferometric Graph Transform: a Deep Unsupervised Graph Representation
We propose the Interferometric Graph Transform (IGT), which is a new class of deep unsupervised graph convolutional neural network for building graph representations. Our first contribution is to propose a generic, complex-valued spectral graph architecture obtained from a generalization of the Euclidean Fourier transform. We show that our learned representation consists of both discriminative and invariant features, thanks to a novel greedy concave objective. From our experiments, we conclude that our learning procedure exploits the topology of the spectral domain, which is normally a flaw of spectral methods, and in particular our method can recover an analytic operator for vision tasks. We test our algorithm on various and challenging tasks such as image classification (MNIST, CIFAR-10), community detection (Authorship, Facebook graph) and action recognition from 3D skeletons videos (SBU, NTU), exhibiting a new state-of-the-art in spectral graph unsupervised settings.
Biologically inspired architectures for sample-efficient deep reinforcement learning
Richemond, Pierre H., Kolbeinsson, Arinbjörn, Guo, Yike
Deep reinforcement learning requires a heavy price in terms of sample efficiency and overparameterization in the neural networks used for function approximation. In this work, we use tensor factorization in order to learn more compact representation for reinforcement learning policies. We show empirically that in the low-data regime, it is possible to learn online policies with 2 to 10 times less total coefficients, with little to no loss of performance. We also leverage progress in second order optimization, and use the theory of wavelet scattering to further reduce the number of learned coefficients, by foregoing learning the topmost convolutional layer filters altogether. We evaluate our results on the Atari suite against recent baseline algorithms that represent the state-of-the-art in data efficiency, and get comparable results with an order of magnitude gain in weight parsimony.