Not enough data to create a plot.
Try a different view from the menu above.
A Proof of Prop
The upper bound of gradient's Frobenius norm for spectrally-normalized discriminators follows directly. From Theorem 1.1 in [44] we know) that if w Proposition 3 (Upper bound of Hessian's spectral norm). Consider the discriminator defined in Eq. (1). The proof is in App. Corollary 1 (Upper bound of Hessian's spectral norm for spectral normalization). Moreover, if the activation for the last layer is sigmoid (e.g., for vanilla GAN [14]), we have H The desired inequalities then follow by induction. Now let's come back to the proof for Prop. Applying Eq. (5) and Lemma 1 we get We use the following lemma to lower bound (10). The following lemma upper bounds the square of the infinity norm of this vector.
Why Spectral Normalization Stabilizes GANs: Analysis and Improvements
Spectral normalization (SN) [30] is a widely-used technique for improving the stability and sample quality of Generative Adversarial Networks (GANs). However, current understanding of SN's efficacy is limited. In this work, we show that SN controls two important failure modes of GAN training: exploding and vanishing gradients. Our proofs illustrate a (perhaps unintentional) connection with the successful LeCun initialization [25]. This connection helps to explain why the most popular implementation of SN for GANs [30] requires no hyper-parameter tuning, whereas stricter implementations of SN [15, 12] have poor empirical performance out-of-the-box. Unlike LeCun initialization which only controls gradient vanishing at the beginning of training, SN preserves this property throughout training. Building on this theoretical understanding, we propose a new spectral normalization technique: Bidirectional Scaled Spectral Normalization (BSSN), which incorporates insights from later improvements to LeCun initialization: Xavier initialization [13] and Kaiming initialization [17]. Theoretically, we show that BSSN gives better gradient control than SN. Empirically, we demonstrate that it outperforms SN in sample quality and training stability on several benchmark datasets.
Flow Matching for Scalable Simulation-Based Inference Anonymous Author(s) Affiliation Address email Neural posterior estimation methods based on discrete normalizing flows have
Figure 1: Comparison of network architectures (left) and flow trajectories (right). Discrete flows (NPE, top) require a specialized architecture for the density estimator. Continuous flows (FMPE, bottom) are based on a vector field parametrized with an unconstrained architecture. FMPE uses this additional flexibility to put an enhanced emphasis on the conditioning data x, which in the SBI context is typically high dimensional and in a complex domain. Further, the optimal transport path produces simple flow trajectories from the base distribution (inset) to the target.
Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms
Self-play via online learning is one of the premier ways to solve large-scale twoplayer zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-descent-ascent (OGDA). While both algorithms enjoy O(1/T) ergodic convergence to Nash equilibrium in two-player zero-sum games, OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix and ร(1/T) convergence to coarse correlated equilibria even in generalsum games. However, in terms of last-iterate convergence in two-player zero-sum games, an increasingly popular topic in this area, OGDA guarantees that the duality gap shrinks at a rate of (1/ T), while the best existing last-iterate convergence for OMWU depends on some game-dependent constant that could be arbitrarily large. This begs the question: is this potentially slow last-iterate convergence an inherent disadvantage of OMWU, or is the current analysis too loose? Somewhat surprisingly, we show that the former is true. More generally, we prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue: for any arbitrarily small ฮด > 0, there exists a 2 2 matrix game such that the algorithm admits a constant duality gap even after 1/ฮด rounds. This class of algorithms includes OMWU and other standard optimistic follow-the-regularized-leader algorithms.
NeuroBOLT: Resting-state EEG-to-fMRI Synthesis with Multi-dimensional Feature Mapping
Functional magnetic resonance imaging (fMRI) is an indispensable tool in modern neuroscience, providing a non-invasive window into whole-brain dynamics at millimeter-scale spatial resolution. However, fMRI is constrained by issues such as high operation costs and immobility. With the rapid advancements in crossmodality synthesis and brain decoding, the use of deep neural networks has emerged as a promising solution for inferring whole-brain, high-resolution fMRI features directly from electroencephalography (EEG), a more widely accessible and portable neuroimaging modality. Nonetheless, the complex projection from neural activity to fMRI hemodynamic responses and the spatial ambiguity of EEG pose substantial challenges both in modeling and interpretability. Relatively few studies to date have developed approaches for EEG-fMRI translation, and although they have made significant strides, the inference of fMRI signals in a given study has been limited to a small set of brain areas and to a single condition (i.e., either resting-state or a specific task). The capability to predict fMRI signals in other brain areas, as well as to generalize across conditions, remain critical gaps in the field.
Supplement: Novel Upper Bounds for the Constrained Most Probable Explanation Task
It is well known that any MPE task can be encoded as an integer linear programming (ILP) problem (cf. A popular or widely used formulation is to associate a Boolean variable with each entry in each function of the log-linear model. When the Boolean variable is assigned the value 1, the entry is selected, otherwise it is not. For instance, a type of consistency constraint encodes the restriction that only entry from each function must be selected. A second type of consistency constraint ensures that if two functions share a variable then only entries which assign the shared variable to the same value are selected.
Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
Successful applications of a machine learning algorithm require the algorithm to generalize well to unseen data. Thus understanding and bounding the generalization error of machine learning algorithms is an area of intense theoretical interest and practical importance. The single most popular approach to modern machine learning relies on the use of continuous optimization techniques to optimize the appropriate loss function, most notably the stochastic (sub)gradient descent (SGD) method. Yet the generalization properties of SGD are still not well understood. Consider the setting of stochastic convex optimization (SCO).
Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. [20] provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important progress in understanding of the generalization properties of SGD and several applications to differentially private convex optimization for smooth losses. Our work is the first to address uniform stability of SGD on nonsmooth convex losses. Specifically, we provide sharp upper and lower bounds for several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex losses.