Neural networks have seen a lot of hype in astronomy and cosmology recently (even just on this site! However, it may be that the neural networks used to classify images in typical machine learning applications are overkill. To quote the authors of today's paper, "the cosmological density field is not as complex as random images of rabbits." Today's authors propose using a method called the "scattering transform" to take advantage of the best parts of neural networks with none of the limitations. The standard cosmological lore states that early on the universe underwent a phase of inflation that is responsible for laying down the initial conditions of the large-scale structure we see in the universe today.

To view the code, training visualizations, and more information about the python example at the end of this post, visit the Comet project page. While much of the writing and literature on deep learning concerns computer vision and natural language processing (NLP), audio analysis -- a field that includes automatic speech recognition (ASR), digital signal processing, and music classification, tagging, and generation -- is a growing subdomain of deep learning applications. Some of the most popular and widespread machine learning systems, virtual assistants Alexa, Siri and Google Home, are largely products built atop models that can extract information from audio signals. Many of our users at Comet are working on audio related machine learning tasks such as audio classification, speech recognition and speech synthesis, so we built them tools to analyze, explore and understand audio data using Comet's meta machine-learning platform. This post is focused on showing how data scientists and AI practitioners can use Comet to apply machine learning and deep learning methods in the domain of audio analysis.

Durall, Ricard, Keuper, Margret, Pfreundt, Franz-Josef, Keuper, Janis

Deep generative models have recently achieved impressive results for many real-world applications, successfully generating high-resolution and diverse samples from complex datasets. Due to this improvement, fake digital contents have proliferated growing concern and spreading distrust in image content, leading to an urgent need for automated ways to detect these AI-generated fake images. Despite the fact that many face editing algorithms seem to produce realistic human faces, upon closer examination, they do exhibit artifacts in certain domains which are often hidden to the naked eye. In this work, we present a simple way to detect such fake face images - so-called DeepFakes. Our method is based on a classical frequency domain analysis followed by basic classifier. Compared to previous systems, which need to be fed with large amounts of labeled data, our approach showed very good results using only a few annotated training samples and even achieved good accuracies in fully unsupervised scenarios. For the evaluation on high resolution face images, we combined several public datasets of real and fake faces into a new benchmark: Faces-HQ. Given such high-resolution images, our approach reaches a perfect classification accuracy of 100% when it is trained on as little as 20 annotated samples. In a second experiment, in the evaluation of the medium-resolution images of the CelebA dataset, our method achieves 100% accuracy supervised and 96% in an unsupervised setting. Finally, evaluating a low-resolution video sequences of the FaceForensics++ dataset, our method achieves 91% accuracy detecting manipulated videos. Source Code: https://github.com/cc-hpc-itwm/DeepFakeDetection

The manual design of analog circuits is a tedious task of parameter tuning that requires hours of work by human experts. In this work, we make a significant step towards a fully automatic design method that is based on deep learning. The method selects the components and their configuration, as well as their numerical parameters. By contrast, the current literature methods are limited to the parameter fitting part only. A two-stage network is used, which first generates a chain of circuit components and then predicts their parameters. A hypernetwork scheme is used in which a weight generating network, which is conditioned on the circuit's power spectrum, produces the parameters of a primal RNN network that places the components. A differential simulator is used for refining the numerical values of the components. We show that our model provides an efficient design solution, and is superior to alternative solutions.

Caldeira, J., Wu, W. L. K., Nord, B., Avestruz, C., Trivedi, S., Story, K. T.

Next-generation cosmic microwave background (CMB) experiments will have lower noise and therefore increased sensitivity, enabling improved constraints on fundamental physics parameters such as the sum of neutrino masses and the tensor-to-scalar ratio r. Achieving competitive constraints on these parameters requires high signal-to-noise extraction of the projected gravitational potential from the CMB maps. Standard methods for reconstructing the lensing potential employ the quadratic estimator (QE). However, the QE performs suboptimally at the low noise levels expected in upcoming experiments. Other methods, like maximum likelihood estimators (MLE), are under active development. In this work, we demonstrate reconstruction of the CMB lensing potential with deep convolutional neural networks (CNN) - ie, a ResUNet. The network is trained and tested on simulated data, and otherwise has no physical parametrization related to the physical processes of the CMB and gravitational lensing. We show that, over a wide range of angular scales, ResUNets recover the input gravitational potential with a higher signal-to-noise ratio than the QE method, reaching levels comparable to analytic approximations of MLE methods. We demonstrate that the network outputs quantifiably different lensing maps when given input CMB maps generated with different cosmologies. We also show we can use the reconstructed lensing map for cosmological parameter estimation. This application of CNN provides a few innovations at the intersection of cosmology and machine learning. First, while training and regressing on images, we predict a continuous-variable field rather than discrete classes. Second, we are able to establish uncertainty measures for the network output that are analogous to standard methods. We expect this approach to excel in capturing hard-to-model non-Gaussian astrophysical foreground and noise contributions.

Kailkhura, Bhavya, Thiagarajan, Jayaraman J., Li, Qunwei, Bremer, Peer-Timo

This paper provides a general framework to study the effect of sampling properties of training data on the generalization error of the learned machine learning (ML) models. Specifically, we propose a new spectral analysis of the generalization error, expressed in terms of the power spectra of the sampling pattern and the function involved. The framework is build in the Euclidean space using Fourier analysis and establishes a connection between some high dimensional geometric objects and optimal spectral form of different state-of-the-art sampling patterns. Subsequently, we estimate the expected error bounds and convergence rate of different state-of-the-art sampling patterns, as the number of samples and dimensions increase. We make several observations about generalization error which are valid irrespective of the approximation scheme (or learning architecture) and training (or optimization) algorithms. Our result also sheds light on ways to formulate design principles for constructing optimal sampling methods for particular problems.

Rodriguez, Andres C, Kacprzak, Tomasz, Lucchi, Aurelien, Amara, Adam, Sgier, Raphael, Fluri, Janis, Hofmann, Thomas, Réfrégier, Alexandre

Dark matter in the universe evolves through gravity to form a complex network of halos, filaments, sheets and voids, that is known as the cosmic web. Computational models of the underlying physical processes, such as classical N-body simulations, are extremely resource intensive, as they track the action of gravity in an expanding universe using billions of particles as tracers of the cosmic matter distribution. Therefore, upcoming cosmology experiments will face a computational bottleneck that may limit the exploitation of their full scientific potential. To address this challenge, we demonstrate the application of a machine learning technique called Generative Adversarial Networks (GAN) to learn models that can efficiently generate new, physically realistic realizations of the cosmic web. Our training set is a small, representative sample of 2D image snapshots from N-body simulations of size 500 and 100 Mpc. We show that the GAN-produced results are qualitatively and quantitatively very similar to the originals. Generation of a new cosmic web realization with a GAN takes a fraction of a second, compared to the many hours needed by the N-body technique. We anticipate that GANs will therefore play an important role in providing extremely fast and precise simulations of cosmic web in the era of large cosmological surveys, such as Euclid and LSST.

Gupta, Arushi, Matilla, José Manuel Zorrilla, Hsu, Daniel, Haiman, Zoltán

Weak lensing maps contain information beyond two-point statistics on small scales. Much recent work has tried to extract this information through a range of different observables or via nonlinear transformations of the lensing field. Here we train and apply a 2D convolutional neural network to simulated noiseless lensing maps covering 96 different cosmological models over a range of {$\Omega_m,\sigma_8$}. Using the area of the confidence contour in the {$\Omega_m,\sigma_8$} plane as a figure-of-merit, derived from simulated convergence maps smoothed on a scale of 1.0 arcmin, we show that the neural network yields $\approx 5 \times$ tighter constraints than the power spectrum, and $\approx 4 \times$ tighter than the lensing peaks. Such gains illustrate the extent to which weak lensing data encode cosmological information not accessible to the power spectrum or even non-Gaussian statistics such as lensing peaks.

Ravanbakhsh, Siamak, Oliva, Junier, Fromenteau, Sebastien, Price, Layne C., Ho, Shirley, Schneider, Jeff, Poczos, Barnabas

A grand challenge of the 21st century cosmology is to accurately estimate the cosmological parameters of our Universe. A major approach to estimating the cosmological parameters is to use the large-scale matter distribution of the Universe. Galaxy surveys provide the means to map out cosmic large-scale structure in three dimensions. Information about galaxy locations is typically summarized in a "single" function of scale, such as the galaxy correlation function or power-spectrum. We show that it is possible to estimate these cosmological parameters directly from the distribution of matter. This paper presents the application of deep 3D convolutional networks to volumetric representation of dark-matter simulations as well as the results obtained using a recently proposed distribution regression framework, showing that machine learning techniques are comparable to, and can sometimes outperform, maximum-likelihood point estimates using "cosmological models". This opens the way to estimating the parameters of our Universe with higher accuracy.