eff
A 1/R Law for Kurtosis Contrast in Balanced Mixtures
Bi, Yuda, Xiao, Wenjun, Bai, Linhao, Calhoun, Vince D
Abstract--Kurtosis-based Independent Component Analysis (ICA) weakens in wide, balanced mixtures. We also show that purification--selecting m R sign-consistent sources--restores R-independent contrast Ω(1/m), with a simple data-driven heuristic. Synthetic experiments validate the predicted decay, the T crossover, and contrast recovery. Independent Component Analysis (ICA) recovers statistically independent latent sources from linear mixtures and is identifiable whenever at most one source is Gaussian [1]. Excess kurtosis--the standardized fourth cumulant--is a central contrast function [9], and kurtosis-type nonlinearities remain standard in FastICA.
- North America > United States > Georgia > Fulton County > Atlanta (0.05)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Health & Medicine > Health Care Technology (0.70)
- Health & Medicine > Therapeutic Area > Neurology (0.70)
Scaling Laws for Precision in High-Dimensional Linear Regression
Zhang, Dechen, Tang, Xuan, Liang, Yingyu, Zou, Difan
Low-precision training is critical for optimizing the trade-off between model quality and training costs, necessitating the joint allocation of model size, dataset size, and numerical precision. While empirical scaling laws suggest that quantization impacts effective model and data capacities or acts as an additive error, the theoretical mechanisms governing these effects remain largely unexplored. In this work, we initiate a theoretical study of scaling laws for low-precision training within a high-dimensional sketched linear regression framework. By analyzing multiplicative (signal-dependent) and additive (signal-independent) quantization, we identify a critical dichotomy in their scaling behaviors. Our analysis reveals that while both schemes introduce an additive error and degrade the effective data size, they exhibit distinct effects on effective model size: multiplicative quantization maintains the full-precision model size, whereas additive quantization reduces the effective model size. Numerical experiments validate our theoretical findings. By rigorously characterizing the complex interplay among model scale, dataset size, and quantization error, our work provides a principled theoretical basis for optimizing training protocols under practical hardware constraints.
Thermodynamic Isomorphism of Transformers: A Lagrangian Approach to Attention Dynamics
We propose an effective field-theoretic framework for analyzing Transformer attention through a thermodynamic lens. By constructing a Lagrangian on the information manifold equipped with the Fisher metric, we show that, within the Shannon--Boltzmann entropy framework, the Softmax function arises as a stationary solution minimizing a Helmholtz free energy functional. This establishes a formal correspondence between scaled dot-product attention and canonical ensemble statistics. Extending this mapping to macroscopic observables, we define an effective specific heat associated with fluctuations of the attention energy landscape. In controlled experiments on the modular addition task ($p = 19$--$113$), we observe a robust peak in this fluctuation measure that consistently precedes the onset of generalization. While no asymptotic power-law divergence is detected in this finite-depth regime, the reproducible enhancement of energy variance suggests a critical-like crossover accompanying representational reorganization. Our framework provides a unified statistical-mechanical perspective on attention scaling, training dynamics, and positional encoding, interpreting the phenomena as emergent properties of an effective thermodynamic system rather than isolated heuristics. Although the present results indicate finite-size crossover behavior rather than a strict phase transition, they motivate further investigation into scaling limits of deep architectures through fluctuation-based observables.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
The Cost of Learning under Multiple Change Points
Gafni, Tomer, Iyengar, Garud, Zeevi, Assaf
We consider an online learning problem in environments with multiple change points. In contrast to the single change point problem that is widely studied using classical "high confidence" detection schemes, the multiple change point environment presents new learning-theoretic and algorithmic challenges. Specifically, we show that classical methods may exhibit catastrophic failure (high regret) due to a phenomenon we refer to as endogenous confounding. To overcome this, we propose a new class of learning algorithms dubbed Anytime Tracking CUSUM (ATC). These are horizon-free online algorithms that implement a selective detection principle, balancing the need to ignore "small" (hard-to-detect) shifts, while reacting "quickly" to significant ones. We prove that the performance of a properly tuned ATC algorithm is nearly minimax-optimal; its regret is guaranteed to closely match a novel information-theoretic lower bound on the achievable performance of any learning algorithm in the multiple change point problem. Experiments on synthetic as well as real-world data validate the aforementioned theoretical findings.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- Transportation > Passenger (0.46)
- Information Technology > Services (0.45)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- North America > Canada > British Columbia > Vancouver (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- (8 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Italy > Liguria > Genoa (0.04)
- North America > Canada > Quebec (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Ontario (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)