lagrangian
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California (0.14)
- North America > United States > Oregon (0.04)
- North America > United States > New York (0.04)
- (2 more...)
- Research Report > Experimental Study (0.93)
- Research Report > Strength High (0.68)
- Law (1.00)
- Health & Medicine > Consumer Health (0.67)
- Health & Medicine > Government Relations & Public Policy (0.46)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Thermodynamic Isomorphism of Transformers: A Lagrangian Approach to Attention Dynamics
We propose an effective field-theoretic framework for analyzing Transformer attention through a thermodynamic lens. By constructing a Lagrangian on the information manifold equipped with the Fisher metric, we show that, within the Shannon--Boltzmann entropy framework, the Softmax function arises as a stationary solution minimizing a Helmholtz free energy functional. This establishes a formal correspondence between scaled dot-product attention and canonical ensemble statistics. Extending this mapping to macroscopic observables, we define an effective specific heat associated with fluctuations of the attention energy landscape. In controlled experiments on the modular addition task ($p = 19$--$113$), we observe a robust peak in this fluctuation measure that consistently precedes the onset of generalization. While no asymptotic power-law divergence is detected in this finite-depth regime, the reproducible enhancement of energy variance suggests a critical-like crossover accompanying representational reorganization. Our framework provides a unified statistical-mechanical perspective on attention scaling, training dynamics, and positional encoding, interpreting the phenomena as emergent properties of an effective thermodynamic system rather than isolated heuristics. Although the present results indicate finite-size crossover behavior rather than a strict phase transition, they motivate further investigation into scaling limits of deep architectures through fluctuation-based observables.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > Connecticut > Hartford County > Hartford (0.04)
- North America > United States > Connecticut > Hartford County > East Hartford (0.04)
- (3 more...)
- North America > United States (0.14)
- North America > Canada (0.04)
Supplementary Information A The principle of least action and the Euler-Lagrange equation Here, we review the principle of least action and the derivation of the Euler-Lagrange equation [
Now, let us derive the differential equation that gives a solution to the variational problem. This condition yields the Euler-Lagrange equation, d dt @ L @ q = @ L @q . Here, we derive the Noether's learning dynamics by applying Noether's theorem to the A general form of the Noether's theorem relates the dynamics of Noether By evaluating the right hand side of Eq. 23, we get e Now, we harness the covariant property of the Lagrangian formulation, i.e., it preserves the form Plugging this expression obtained from the steady-state condition of Eq.27 Here, we ignore the inertia term in Eq. 16, assuming that the mass (learning rate) is finite but small All the experiments were run using the PyTorch code base. We used Tiny ImageNet dataset to generate all the empirical figures in this work. The key hyperparameters we used are listed with each figure.
- Asia > Middle East > Jordan (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- (3 more...)
- North America > United States (0.28)
- Asia > India (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > Canada (0.04)