Not enough data to create a plot.
Try a different view from the menu above.
Separations in the Representational Capabilities of Transformers and Recurrent Architectures Michael Hahn 2 Phil Blunsom 1,3
Transformer architectures have been widely adopted in foundation models. Due to their high inference costs, there is renewed interest in exploring the potential of efficient recurrent architectures (RNNs). In this paper, we analyze the differences in the representational capabilities of Transformers and RNNs across several tasks of practical relevance, including index lookup, nearest neighbor, recognizing bounded Dyck languages, and string equality. For the tasks considered, our results show separations based on the size of the model required for different architectures. For example, we show that a one-layer Transformer of logarithmic width can perform index lookup, whereas an RNN requires a hidden state of linear size. Conversely, while constant-size RNNs can recognize bounded Dyck languages, we show that one-layer Transformers require a linear size for this task. Furthermore, we show that two-layer Transformers of logarithmic size can perform decision tasks such as string equality or disjointness, whereas both one-layer Transformers and recurrent models require linear size for these tasks. We also show that a log-size two-layer Transformer can implement the nearest neighbor algorithm in its forward pass; on the other hand recurrent models require linear size. Our constructions are based on the existence of N nearly orthogonal vectors in O(log N) dimensional space and our lower bounds are based on reductions from communication complexity problems.
Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets
Arthur da Cunha, Universitรฉ Cรดte d'Azur, Inria, CNRS, I3S, Aarhus University, Aarhus, Denmark, dac@cs.au.dk, "3026 Francesco d'Amore, Aalto University, Bocconi University, Espoo, Finland, francesco.damore@aalto.fi "3026 Emanuele Natale, Universitรฉ Cรดte d'Azur, Inria, CNRS, I3S, Sophia Antipolis, France, emanuele.natale@inria.fr
The Strong Lottery Ticket Hypothesis (SLTH) states that randomly-initialised neural networks likely contain subnetworks that perform well without any training. Although unstructured pruning has been extensively studied in this context, its structured counterpart, which can deliver significant computational and memory efficiency gains, has been largely unexplored. One of the main reasons for this gap is the limitations of the underlying mathematical tools used in formal analyses of the SLTH. In this paper, we overcome these limitations: we leverage recent advances in the multidimensional generalisation of the Random Subset-Sum Problem and obtain a variant that admits the stochastic dependencies that arise when addressing structured pruning in the SLTH. We apply this result to prove, for a wide class of random Convolutional Neural Networks, the existence of structured subnetworks that can approximate any sufficiently smaller network. This result provides the first sub-exponential bound around the SLTH for structured pruning, opening up new avenues for further research on the hypothesis and contributing to the understanding of the role of over-parameterization in deep learning.
TANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition
Creation of 3D content by stylization is a promising yet challenging problem in computer vision and graphics research. In this work, we focus on stylizing photorealistic appearance renderings of a given surface mesh of arbitrary topology. Motivated by the recent surge of cross-modal supervision of the Contrastive Language-Image Pre-training (CLIP) model, we propose TANGO, which transfers the appearance style of a given 3D shape according to a text prompt in a photorealistic manner. Technically, we propose to disentangle the appearance style as the spatially varying bidirectional reflectance distribution function, the local geometric variation, and the lighting condition, which are jointly optimized, via supervision of the CLIP loss, by a spherical Gaussians based differentiable renderer. As such, TANGO enables photorealistic 3D style transfer by automatically predicting reflectance effects even for bare, low-quality meshes, without training on a task-specific dataset. Extensive experiments show that TANGO outperforms existing methods of text-driven 3D style transfer in terms of photorealistic quality, consistency of 3D geometry, and robustness when stylizing low-quality meshes. Our codes and results are available at our project webpage https://cyw-3d.github.io/tango.
Will Elden Ring film be 'awesome' or 'meh'? Fans have thoughts
Elden Ring is a role-playing adventure game set in the war-torn, devastated Lands Between, where players must collect runes which represent that world's order and laws, in order to restore it and become the Elden Lord. TikToker Everythingethan added a note of caution, saying: "I want to know what part of the timeline we're adapting... I don't know if I want to see this live action. I think it would be kind of cursed at times. I think animation is the best way to adapt video games nine times out of 10."
5 AI terms you keep hearing and what they actually mean
Tyler Saltsman, founder and CEO of EdgeRunner AI, warns that creating artificial general intelligence could "destroy the world as we know it." Whether it's powering your phone's autocorrect or helping someone create a new recipe with a few words, artificial intelligence (AI) is everywhere right now. But if you're still nodding along when someone mentions "neural networks" or "generative AI," you're not alone. Today I am breaking down five buzzy AI terms that you've probably seen in headlines, group chats or app updates, minus the tech talk. Understanding these basics will help you talk AI with confidence, even if you're not a programmer.
Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games
We study gradient descent-ascent learning dynamics with timescale separation (ฯ-GDA) in unconstrained continuous action zero-sum games where the minimizing player faces a nonconvex optimization problem and the maximizing player optimizes a Polyak-ลojasiewicz (Pล) or strongly-concave (SC) objective. In contrast to past work on gradient-based learning in nonconvex-Pล/SC zero-sum games, we assess convergence in relation to natural game-theoretic equilibria instead of only notions of stationarity. In pursuit of this goal, we prove that the only locally stable points of the ฯ-GDA continuous-time limiting system correspond to strict local minmax equilibria in each class of games. For these classes of games, we exploit timescale separation to construct a potential function that when combined with the stability characterization and an asymptotic saddle avoidance result gives a global asymptotic almost-sure convergence guarantee for the discrete-time gradient descent-ascent update to a set of the strict local minmax equilibrium. Moreover, we provide convergence rates for the gradient descent-ascent dynamics with timescale separation to approximate stationary points.
David Bertoin
Deep reinforcement learning policies, despite their outstanding efficiency in simulated visual control tasks, have shown disappointing ability to generalize across disturbances in the input training images. Changes in image statistics or distracting background elements are pitfalls that prevent generalization and real-world applicability of such control policies. We elaborate on the intuition that a good visual policy should be able to identify which pixels are important for its decision, and preserve this identification of important sources of information across images. This implies that training of a policy with small generalization gap should focus on such important pixels and ignore the others. This leads to the introduction of saliency-guided Q-networks (SGQN), a generic method for visual reinforcement learning, that is compatible with any value function learning method. SGQN vastly improves the generalization capability of Soft Actor-Critic agents and outperforms existing state-of-the-art methods on the Deepmind Control Generalization benchmark, setting a new reference in terms of training efficiency, generalization gap, and policy interpretability.
Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner
Flexible and accurate drag-based editing is a challenging task that has recently garnered significant attention. Current methods typically model this problem as automatically learning "how to drag" through point dragging and often produce one deterministic estimation, which presents two key limitations: 1) Overlooking the inherently ill-posed nature of drag-based editing, where multiple results may correspond to a given input, as illustrated in Figure 1; 2) Ignoring the constraint of image quality, which may lead to unexpected distortion. To alleviate this, we propose LucidDrag, which shifts the focus from "how to drag" to "what-then-how" paradigm. LucidDrag comprises an intention reasoner and a collaborative guidance sampling mechanism. The former infers several optimal editing strategies, identifying what content and what semantic direction to be edited. Based on the former, the latter addresses "how to drag" by collaboratively integrating existing editing guidance with the newly proposed semantic guidance and quality guidance. Specifically, semantic guidance is derived by establishing a semantic editing direction based on reasoned intentions, while quality guidance is achieved through classifier guidance using an image fidelity discriminator. Both qualitative and quantitative comparisons demonstrate the superiority of LucidDrag over previous methods.
Save over 100 on Sony XM4 headphones ahead of Memorial Day
SAVE 120: As of May 23, Sony WH-1000XM4 headphones are on sale for 228 at Amazon. If you're looking for a seriously high-quality pair of headphones, you won't want to miss this great deal on Sony XM4s. Premium noise cancellation, stellar sound quality, and Alexa voice control, these are next level. And of May 23, you can get them for less. At Amazon, they are currently on sale for 228, saving you 120 on list price.
Forget Cocomelon--this kids' app won't rot their brains
If your child loves their tablet, but you struggle with finding appropriate games, try Pok Pok, a learning app for kids aged 2-8 that doesn't feel like learning. It features a collection of calming, open-ended digital toys that help children explore STEM, problem-solving, creativity, and more without ads, in-app purchases, or overstimulation. Built by parents in collaboration with early childhood experts, Pok Pok offers a Montessori-inspired experience that supports healthy screen time and lifelong learning. Kids using Pok Pok build foundational skills in STEM, problem-solving, language, numbers, cause and effect, and emotional development. Each game is open-ended, so there's no "winning" or "losing."