Goto

Collaborating Authors

 Technology


Towards Understanding Transformers in Learning Random Walks

Neural Information Processing Systems

Transformers have proven highly effective across various applications, especially in handling sequential data such as natural languages and time series. However, transformer models often lack clear interpretability, and the success of transformers has not been well understood in theory. In this paper, we study the capability and interpretability of transformers in learning a family of classic statistical models, namely random walks on circles. We theoretically demonstrate that, after training with gradient descent, a one-layer transformer model can achieve optimal accuracy in predicting random walks. Importantly, our analysis reveals that the trained model is interpretable: the trained softmax attention serves as a token selector, focusing on the direct parent state; subsequently, the value matrix executes a onestep probability transition to predict the location of the next state based on this parent state. We also show that certain edge cases not covered by our theory are indeed failure cases, demonstrating that our theoretical conditions are tight. By investigating these success and failure cases, it is revealed that gradient descent with small initialization may fail or struggle to converge to a good solution in certain simple tasks even beyond random walks. Experiments are conducted to support our theoretical findings.


Table2LaTeX-RL: High-Fidelity LaTeXCode Generation from Table Images via Reinforced Multimodal Language Models

Neural Information Processing Systems

In this work, we address the task of table image to LaTeX code generation, with the goal of automating the reconstruction of high-quality, publication-ready tables from visual inputs. A central challenge of this task lies in accurately handling complex tables--those with large sizes, deeply nested structures, and semantically rich or irregular cell content--where existing methods often fail. We begin with a comprehensive analysis, identifying key challenges and highlighting the limitations of current evaluation protocols. To overcome these issues, we propose a reinforced multimodal large language model (MLLM) framework, where a pre-trained MLLM is fine-tuned on a large-scale table-to-LaTeX dataset. To further improve generation quality, we introduce a dual-reward reinforcement learning strategy based on Group Relative Policy Optimization (GRPO). Unlike standard approaches that optimize purely over text outputs, our method incorporates both a structure-level reward on LaTeX code and a visual fidelity reward computed from rendered outputs, enabling direct optimization of the visual output quality. We adopt a hybrid evaluation protocol combining TEDS-Structure and CW-SSIM, and show that our method achieves state-of-the-art performance, particularly on structurally complex tables, demonstrating the effectiveness and robustness of our approach.


The White House Wants Anthropic to Block All Jailbreaks. That May Not Be Possible

WIRED

Trump administration officials tell WIRED that if Anthropic wants to rerelease Fable 5, it will need to ensure the model's guardrails can't be circumvented. Security experts say that can't be done. The Trump administration's disagreement with Anthropic over its most advanced AI models appears to be fast coming to a head. Trump officials tell Inner Loop that if Anthropic wants to rerelease Claude Fable 5, the AI model that they took offline with export controls last week over concerns about jailbreaking--a method of using prompts to get around a model's safeguards--the company will need to take steps to actually address what the government alleges are vulnerabilities. Anthropic has said for days that the administration's concerns are overblown and that the effects of the jailbreaks are minimal.


Young Palestinian women learn AI to tell stories of war on Gaza

Al Jazeera

'This is an apartheid regime' Does Trump have real leverage over Netanyahu? Young Palestinian women in Gaza are learning to use artificial intelligence to create short films and tell stories about their life during the war. Trump: 'Very strong' Iran deal is a'wall to a nuclear weapon'


Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

Neural Information Processing Systems

Vision-Language models (VLMs) show impressive abilities to answer questions on visual inputs (e.g., counting objects in an image), yet demonstrate higher accuracies when performing an analogous task on text (e.g., counting words in a text). We investigate this accuracy gap by identifying and comparing the circuits--the task-specific computational sub-graphs--in different modalities. We show that while circuits are largely disjoint between modalities, they implement relatively similar functionalities: the differences lie primarily in processing modality-specific data positions (an image or a text sequence). Zooming in on the image data representations, we observe they become aligned with the higher-performing analogous textual representations only towards later layers, too late in processing to effectively influence subsequent positions. To overcome this, we patch the representations of visual data tokens from later layers back into earlier layers. In experiments with multiple tasks and models, this simple intervention closes a third of the performance gap between the modalities, on average.


Solving Neural Min-Max Games: The Role of Architecture, Initialization & Dynamics

Neural Information Processing Systems

Many emerging applications--such as adversarial training, AI alignment, and robust optimization--can be framed as zero-sum games between neural nets, with von Neumann-Nash equilibria (NE) capturing the desirable system behavior. While such games often involve non-convex non-concave objectives, empirical evidence shows that simple gradient methods frequently converge, suggesting a hidden geometric structure. In this paper, we provide a theoretical framework that explains this phenomenon through the lens of hidden convexity and overparameterization. We identify sufficient conditions--spanning initialization, training dynamics, and network width--that guarantee global convergence to a NE in a broad class of non-convex min-max games. To our knowledge, this is the first such result for games that involve two-layer neural networks. Technically, our approach is twofold: (a) we derive a novel path-length bound for the alternating gradient descent-ascent scheme in min-max games; and (b) we show that the reduction from a hidden convex-concave geometry to two-sided Polyak-ลojasiewicz (PL) min-max condition hold with high probability under overparameterization, using tools from random matrix theory.


FAN: Fourier Analysis Networks

Neural Information Processing Systems

Despite the remarkable successes of general-purpose neural networks, such as MLPs and Transformers, we find that they exhibit notable shortcomings in modeling and reasoning about periodic phenomena, achieving only marginal performance within the training domain and failing to generalize effectively to out-of-domain (OOD) scenarios. Periodicity is ubiquitous throughout nature and science. Therefore, neural networks should be equipped with the essential ability to model and handle periodicity. In this work, we propose FAN, a novel neural network that effectively addresses periodicity modeling challenges while offering broad applicability similar to MLP with fewer parameters and FLOPs. Periodicity is naturally integrated into FAN's structure and computational processes by introducing the Fourier Principle. Unlike existing Fourier-based networks, which possess particular periodicity modeling abilities but face challenges in scaling to deeper networks and are typically designed for specific tasks, our approach overcomes this challenge to enable scaling to large-scale models and maintains the capability to be applied to more types of tasks. Through extensive experiments, we demonstrate the superiority of FAN in periodicity modeling tasks and the effectiveness and generalizability of FAN across a range of real-world tasks. Moreover, we reveal that compared to existing Fourier-based networks, FAN accommodates both periodicity modeling and general-purpose modeling well.


On the rankability of visual embeddings

Neural Information Processing Systems

We study whether visual embedding models capture continuous, ordinal attributes along linear directions, which we term rank axes. We define a model as rankable for an attribute if projecting embeddings onto such an axis preserves the attribute's order. Across 7 popular encoders and 9 datasets with attributes like age, crowd count, head pose, aesthetics, and recency, we find that many embeddings are inherently rankable. Surprisingly, a small number of samples, or even just two extreme examples, often suffice to recover meaningful rank axes, without full-scale supervision. These findings open up new use cases for image ranking in vector databases and motivate further study into the structure and learning of rankable embeddings.


AI will create more jobs for humans, not replace them, Amaon founder Bezos says

BBC News

AI will lead to more need for workers rather than make people redundant, Amazon founder Jeff Bezos predicted during an appearance at a tech conference in Paris. Bezos pushed back against growing concerns that AI will replace large numbers of workers. Instead he argued that the tech will unlock new opportunities and increase demand for human labour. This is in contradiction to some other tech and political figures - including former UK prime minister Rishi Sunak, now an adviser to Microsoft and AI firm Anthropic, who recently said AI was having an impact on young people's job prospects . I know there's a lot of concern that many people have, including many smart people, that AI is going to make humans redundant and so on, Bezos said.