Goto

Collaborating Authors

 physics


Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Neural Information Processing Systems

Understanding architectural differences in language models is challenging, especially at academic-scale pretraining (e.g., 1.3B parameters, 100B tokens), where results are often dominated by noise and randomness. To overcome this, we introduce controlled synthetic pretraining tasks that isolate and evaluate core model capabilities. Within this framework, we discover Canon layers: lightweight architectural components--named after the musical term "canon"--that promote horizontal information flow across neighboring tokens. Canon layers compute weighted sums of nearby token representations and integrate seamlessly into Transformers, linear attention, state-space models, or any sequence architecture.


2ea18fdc667e0ef2ad82b2b4d65147ad-Paper-Conference.pdf

Neural Information Processing Systems

Digitizing offers significant the physical opportunities world into in accurate a variety simulation of fields such -ready as virtual augmented environments and virtual understanding as geometry reality, g completeness, aming, methods and commonly robotics.


Scaling Physical Reasoning with the PHYSICS Dataset

Neural Information Processing Systems

Large Language Models (LLMs) have achieved remarkable progress on advanced reasoning tasks such as mathematics and coding competitions. Meanwhile, physics, despite being both reasoning-intensive and essential to real-world understanding, received limited academic and industrial attention. This paper introduces PHYSICS, a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels, to facilitate this issue. Specifically, PHYSICS is curated with exercises from over 100 textbooks through a carefully designed pipeline for quality control. It covers five major physics domains: Mechanics, Electromagnetism, Thermodynamics, Optics, and Modern Physics.


Contrastive Self-Supervised Learning As Neural Manifold Packing

Neural Information Processing Systems

Contrastive self-supervised learning based on point-wise comparisons has been widely studied for vision tasks. In the visual cortex of the brain, neuronal responses to distinct stimulus classes are organized into geometric structures known as neural manifolds. Accurate classification of stimuli can be achieved by effectively separating these manifolds, akin to solving a packing problem. We introduce Contrastive Learning As Manifold Packing (CLAMP), a self-supervised framework that recasts representation learning as a manifold packing problem. CLAMP introduces a loss function inspired by the potential energy of short-range repulsive particle systems, such as those encountered in the physics of simple liquids and jammed packings.


VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models

Neural Information Processing Systems

Recent advancements in text-to-video (T2V) diffusion models have enabled high-fidelity and realistic video synthesis. However, current T2V models often struggle to generate physically plausible content due to their limited inherent ability to accurately understand physics. We found that while the representations within T2V models possess some capacity for physics understanding, they lag significantly behind those from recent video self-supervised learning methods. To this end, we propose a novel framework called {VideoREPA}, which distills physics understanding capability from video understanding foundation models into T2V models by aligning token-level relations. This closes the physics understanding gap and enables more physics-plausible generation. Specifically, we introduce the {Token Relation Distillation (TRD) loss}, leveraging spatio-temporal alignment to provide soft guidance suitable for finetuning powerful pre-trained T2V models--a critical departure from prior representation alignment (REPA) methods. To our knowledge, VideoREPA is the first REPA method designed for finetuning T2V models and specifically for injecting physical knowledge. Empirical evaluations show that VideoREPA substantially enhances the physics commonsense of baseline method, CogVideoX, achieving significant improvement on relevant benchmarks and demonstrating a strong capacity for generating videos consistent with intuitive physics. Code and more video results are available at https://videorepa.github.io/.


Physics-informed Neural Operator for Pansharpening

Neural Information Processing Systems

Over the past decades, pansharpening has contributed greatly to numerous remote sensing applications, with methods evolving from theoretically grounded models to deep learning approaches and their hybrids. Though promising, existing methods rarely address pansharpening through the lens of underlying physical imaging processes. In this work, we revisit the spectral imaging mechanism and propose a novel physics informed neural operator framework for pansharpening, termed PINO, which faithfully models the end to end electro optical sensor process. Specifically, PINO operates as: (1) First, a spatial-spectral encoder pair is introduced to aggregate multi-granularity high-resolution panchromatic (PAN) and low-resolution multispectral (LRMS) features.


Towards Prospective Medical Image Reconstruction via Knowledge-Informed Dynamic Optimal Transport

Neural Information Processing Systems

Medical image reconstruction from measurement data is a vital but challenging inverse problem. Deep learning approaches have achieved promising results, but often requires paired measurement and high-quality images, which is typically simulated through a forward model, i.e., retrospective reconstruction. However, training on simulated pairs commonly leads to performance degradation on real prospective data due to the retrospective-to-prospective gap caused by incomplete imaging knowledge in simulation. To address this challenge, this paper introduces imaging Knowledge-Informed Dynamic Optimal Transport (KIDOT), a novel dynamic optimal transport framework with optimality in the sense of preserving consistency with imaging physics in transport, that conceptualizes reconstruction as finding a dynamic transport path. KIDOT learns from unpaired data by modeling reconstruction as a continuous evolution path from measurements to images, guided by an imaging knowledge-informed cost function and transport equation. This dynamic and knowledge-aware approach enhances robustness and better leverages unpaired data while respecting acquisition physics. Theoretically, we demonstrate that KIDOT naturally generalizes dynamic optimal transport, ensuring its mathematical rationale and solution existence. Extensive experiments on MRI and CT reconstruction demonstrate KIDOT's superior performance.


PhySense: Sensor Placement Optimization for Accurate Physics Sensing

Neural Information Processing Systems

Physics sensing plays a central role in many scientific and engineering domains, which inherently involves two coupled tasks: reconstructing dense physical fields from sparse observations and optimizing scattered sensor placements to observe maximum information. While deep learning has made rapid advances in sparse-data reconstruction, existing methods generally omit optimization of sensor placements, leaving the mutual enhancement between reconstruction and placement on the shelf. To change this suboptimal practice, we propose PhySense, a synergistic two-stage framework that learns to jointly reconstruct physical fields and to optimize sensor placements, both aiming for accurate physics sensing. The first stage involves a flow-based generative model enhanced by cross-attention to adaptively fuse sparse observations. Leveraging the reconstruction feedback, the second stage performs sensor placement via projected gradient descent to satisfy spatial constraints. We further prove that the learning objectives of the two stages are consistent with classical variance-minimization principles, providing theoretical guarantees. Extensive experiments across three challenging benchmarks, especially a 3D geometry dataset, indicate PhySense achieves state-of-the-art physics sensing accuracy and discovers informative sensor placements previously unconsidered. Code is available at this repository: https://github.com/thuml/PhySense.


Horror video game gets its creepiness from a quantum computer

New Scientist

Quantum Backrooms is a horror game in which the player explores eerie rooms. A quantum computer has been used to create a horror video game called - and it's available to play online. Peculiarities of quantum objects have long inspired philosophers and artists, and now game developers are getting the bug too. James Wootton at Moth Quantum and his colleagues developed, a horror game with labyrinthine levels generated by a real quantum computer . The game draws inspiration from "the Backrooms," a horror legend developed on internet forums that consists of moving through a series of endless rooms.


3 things you need to know about quantum computers, from an expert

New Scientist

What use is a quantum computer? Are you imagining an ordinary computer, but somehow just better? If so, that would be a mistake, because quantum computers are fundamentally different. They rely on exotic quantum phenomena occurring between their constituent parts, known as qubits, but their strange nature often invites myths and misconceptions. Quantum computing expert Shayan Majidy at Harvard University, the lead author of, is here to get you up to speed.