Advection Augmented Convolutional Neural Networks Siddharth Rout
Many problems in physical sciences are characterized by the prediction of spacetime sequences. Such problems range from weather prediction to the analysis of disease propagation and video prediction. Modern techniques for the solution of these problems typically combine Convolution Neural Networks (CNN) architecture with a time prediction mechanism. However, oftentimes, such approaches underperform in the long-range propagation of information and lack explainability. In this work, we introduce a physically inspired architecture for the solution of such problems. Namely, we propose to augment CNNs with advection by designing a novel semi-Lagrangian push operator. We show that the proposed operator allows for the non-local transformation of information compared with standard convolutional kernels. We then complement it with Reaction and Diffusion neural components to form a network that mimics the Reaction-Advection-Diffusion equation, in high dimensions. We demonstrate the effectiveness of our network on a number of spatio-temporal datasets that show their merit.
Supplementary Material
We will transfer the FixMatch code to the CORDS repository to have a unified repository shortly. A.2 Licenses We release both the code repositories of R Nevertheless, the authors of the DS3L [17] made the code available for everyone to use. Nevertheless, the owner of the repository made the code available for everyone to use. CIFAR10 dataset is released with an MIT license. MNIST dataset is released with an Creative Commons Attribution-Share Alike 3.0 license.
Coreset Selection for Efficient and Robust Semi-Supervised Learning
Semi-supervised learning (SSL) algorithms have had great success in recent years in limited labeled data regimes. However, the current state-of-the-art SSL algorithms are computationally expensive and entail significant compute time and energy requirements. This can prove to be a huge limitation for many smaller companies and academic groups. Our main insight is that training on a subset of unlabeled data instead of entire unlabeled data enables the current SSL algorithms to converge faster, significantly reducing computational costs.
Large-Scale Adversarial Training for Vision-and-Language Representation Learning: Supplementary Material
This supplementary material contains three sections. Section A.1 reviews additional related work. Section A.2 provides additional experimental results. Section A.3 describes downstream tasks and implementation details. A.1 Additional Related Work Adversarial Training Many efforts have been devoted to improving AT from different angles: (i) use triplet-wise metric learning [8, 7] and optimal transport [20] to leverage inter-sample interactions; (ii) exploit extra unlabeled training data [12, 1]; and (iii) accelerate the training procedure [11, 19, 14].
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Instead of adding adversarial perturbations on image pixels and textual tokens, we propose to perform adversarial training in the embedding space of each modality. To enable large-scale training, we adopt the "free" adversarial training strategy, and combine it with KL-divergence-based regularization to promote higher invariance in the embedding space.
Mean-Field Langevin Dynamics for Signed Measures via a Bilevel Approach Guillaume Wang Lénaïc Chizat
Mean-field Langevin dynamics (MLFD) is a class of interacting particle methods that tackle convex optimization over probability measures on a manifold, which are scalable, versatile, and enjoy computational guarantees. However, some important problems - such as risk minimization for infinite width two-layer neural networks, or sparse deconvolution - are originally defined over the set of signed, rather than probability, measures. In this paper, we investigate how to extend the MFLD framework to convex optimization problems over signed measures. Among two known reductions from signed to probability measures - the lifting and the bilevel approaches - we show that the bilevel reduction leads to stronger guarantees and faster rates (at the price of a higher per-iteration complexity). In particular, we investigate the convergence rate of MFLD applied to the bilevel reduction in the low-noise regime and obtain two results. First, this dynamics is amenable to an annealing schedule, adapted from [SWON23], that results in improved convergence rates to a fixed multiplicative accuracy. Second, we investigate the problem of learning a single neuron with the bilevel approach and obtain local exponential convergence rates that depend polynomially on the dimension and noise level (to compare with the exponential dependence that would result from prior analyses).