Goto

Collaborating Authors

 Technology


Young Palestinian women learn AI to tell stories of war on Gaza

Al Jazeera

'This is an apartheid regime' Does Trump have real leverage over Netanyahu? Young Palestinian women in Gaza are learning to use artificial intelligence to create short films and tell stories about their life during the war. Trump: 'Very strong' Iran deal is a'wall to a nuclear weapon'


Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

Neural Information Processing Systems

Vision-Language models (VLMs) show impressive abilities to answer questions on visual inputs (e.g., counting objects in an image), yet demonstrate higher accuracies when performing an analogous task on text (e.g., counting words in a text). We investigate this accuracy gap by identifying and comparing the circuits--the task-specific computational sub-graphs--in different modalities. We show that while circuits are largely disjoint between modalities, they implement relatively similar functionalities: the differences lie primarily in processing modality-specific data positions (an image or a text sequence). Zooming in on the image data representations, we observe they become aligned with the higher-performing analogous textual representations only towards later layers, too late in processing to effectively influence subsequent positions. To overcome this, we patch the representations of visual data tokens from later layers back into earlier layers. In experiments with multiple tasks and models, this simple intervention closes a third of the performance gap between the modalities, on average.


Solving Neural Min-Max Games: The Role of Architecture, Initialization & Dynamics

Neural Information Processing Systems

Many emerging applications--such as adversarial training, AI alignment, and robust optimization--can be framed as zero-sum games between neural nets, with von Neumann-Nash equilibria (NE) capturing the desirable system behavior. While such games often involve non-convex non-concave objectives, empirical evidence shows that simple gradient methods frequently converge, suggesting a hidden geometric structure. In this paper, we provide a theoretical framework that explains this phenomenon through the lens of hidden convexity and overparameterization. We identify sufficient conditions--spanning initialization, training dynamics, and network width--that guarantee global convergence to a NE in a broad class of non-convex min-max games. To our knowledge, this is the first such result for games that involve two-layer neural networks. Technically, our approach is twofold: (a) we derive a novel path-length bound for the alternating gradient descent-ascent scheme in min-max games; and (b) we show that the reduction from a hidden convex-concave geometry to two-sided Polyak-ลojasiewicz (PL) min-max condition hold with high probability under overparameterization, using tools from random matrix theory.


FAN: Fourier Analysis Networks

Neural Information Processing Systems

Despite the remarkable successes of general-purpose neural networks, such as MLPs and Transformers, we find that they exhibit notable shortcomings in modeling and reasoning about periodic phenomena, achieving only marginal performance within the training domain and failing to generalize effectively to out-of-domain (OOD) scenarios. Periodicity is ubiquitous throughout nature and science. Therefore, neural networks should be equipped with the essential ability to model and handle periodicity. In this work, we propose FAN, a novel neural network that effectively addresses periodicity modeling challenges while offering broad applicability similar to MLP with fewer parameters and FLOPs. Periodicity is naturally integrated into FAN's structure and computational processes by introducing the Fourier Principle. Unlike existing Fourier-based networks, which possess particular periodicity modeling abilities but face challenges in scaling to deeper networks and are typically designed for specific tasks, our approach overcomes this challenge to enable scaling to large-scale models and maintains the capability to be applied to more types of tasks. Through extensive experiments, we demonstrate the superiority of FAN in periodicity modeling tasks and the effectiveness and generalizability of FAN across a range of real-world tasks. Moreover, we reveal that compared to existing Fourier-based networks, FAN accommodates both periodicity modeling and general-purpose modeling well.


On the rankability of visual embeddings

Neural Information Processing Systems

We study whether visual embedding models capture continuous, ordinal attributes along linear directions, which we term rank axes. We define a model as rankable for an attribute if projecting embeddings onto such an axis preserves the attribute's order. Across 7 popular encoders and 9 datasets with attributes like age, crowd count, head pose, aesthetics, and recency, we find that many embeddings are inherently rankable. Surprisingly, a small number of samples, or even just two extreme examples, often suffice to recover meaningful rank axes, without full-scale supervision. These findings open up new use cases for image ranking in vector databases and motivate further study into the structure and learning of rankable embeddings.


AI will create more jobs for humans, not replace them, Amaon founder Bezos says

BBC News

AI will lead to more need for workers rather than make people redundant, Amazon founder Jeff Bezos predicted during an appearance at a tech conference in Paris. Bezos pushed back against growing concerns that AI will replace large numbers of workers. Instead he argued that the tech will unlock new opportunities and increase demand for human labour. This is in contradiction to some other tech and political figures - including former UK prime minister Rishi Sunak, now an adviser to Microsoft and AI firm Anthropic, who recently said AI was having an impact on young people's job prospects . I know there's a lot of concern that many people have, including many smart people, that AI is going to make humans redundant and so on, Bezos said.



Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing

Neural Information Processing Systems

Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model's performance. While prior works have demonstrated the benefits of specific heuristic retraining schemes, the question of how to optimally combine the model's predictions and the provided labels remains largely open.


The Parameterized Complexity of Computing the VC-Dimension

Neural Information Processing Systems

The VC-dimension is a well-studied and fundamental complexity measure of a set system (or hypergraph) that is central to many areas of machine learning. We establish several new results on the complexity of computing the VC-dimension. In particular, given a hypergraph H = (V,E), we prove that the naive 2O(|V|)-time algorithm is asymptotically tight under the Exponential Time Hypothesis (ETH). We then prove that the problem admits a 1-additive fixed-parameter approximation algorithm when parameterized by the maximum degree of Hand a fixed-parameter algorithm when parameterized by its dimension, and that these are essentially the only such exploitable structural parameters.


Bridging Scales: Spectral Theory Reveals How Local Connectivity Rules Sculpt Global Neural Dynamics in Spatially Extended Networks

Neural Information Processing Systems

The brain's diverse spatiotemporal activity patterns are fundamental to cognition and consciousness, yet how these macroscopic dynamics emerge from microscopic neural circuitry remains a critical challenge. We take a step in this direction by developing a spatially extended neural network model integrated with a spectral theory of its connectivity matrix. Our theory quantitatively demonstrates how local structural parameters, such as E/I neuron projection ranges, connection strengths, and density determine distinct features of the eigenvalue spectrum, specifically outlier eigenvalues and a bulk disk. These spectral signatures, in turn, precisely predict the network's emergent global dynamical regime, encompassing asynchronous states, synchronous states, oscillations, localized activity bumps, traveling waves, and chaos. Motivated by observations of shifting cortical dynamics in mice across arousal states, our framework not only provides a possible explanation for repertoire of behaviors but also offers a principled starting point for inferring underlying effective connectivity changes from macroscopic brain activity. By mechanistically linking neural structure to dynamics, this work advances a principled framework for dissecting how large-scale activity patterns--central to cognition and open questions in consciousness research--arise from, and constrain, local circuitry.