neural
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Toronto (0.04)
Deriving Neural Scaling Laws from the statistics of natural language
Cagnetta, Francesco, Raventós, Allan, Ganguli, Surya, Wyart, Matthieu
Despite the fact that experimental neural scaling laws have substantially guided empirical progress in large-scale machine learning, no existing theory can quantitatively predict the exponents of these important laws for any modern LLM trained on any natural language dataset. We provide the first such theory in the case of data-limited scaling laws. We isolate two key statistical properties of language that alone can predict neural scaling exponents: (i) the decay of pairwise token correlations with time separation between token pairs, and (ii) the decay of the next-token conditional entropy with the length of the conditioning context. We further derive a simple formula in terms of these statistics that predicts data-limited neural scaling exponents from first principles without any free parameters or synthetic data models. Our theory exhibits a remarkable match with experimentally measured neural scaling laws obtained from training GPT-2 and LLaMA style models from scratch on two qualitatively different benchmarks, TinyStories and WikiText.
- North America > United States > California > Santa Clara County > Stanford (0.14)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (2 more...)
- Information Technology (1.00)
- Health & Medicine (0.68)
- Transportation > Ground > Road (0.68)
- North America > United States > California (0.04)
- Asia > Middle East > Israel (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
On the origin of neural scaling laws: from random graphs to natural language
Barkeshli, Maissam, Alfarano, Alberto, Gromov, Andrey
Scaling laws have played a major role in the modern AI revolution, providing practitioners predictive power over how the model performance will improve with increasing data, compute, and number of model parameters. This has spurred an intense interest in the origin of neural scaling laws, with a common suggestion being that they arise from power law structure already present in the data. In this paper we study scaling laws for transformers trained to predict random walks (bigrams) on graphs with tunable complexity. We demonstrate that this simplified setting already gives rise to neural scaling laws even in the absence of power law structure in the data correlations. We further consider dialing down the complexity of natural language systematically, by training on sequences sampled from increasingly simplified generative language models, from 4,2,1-layer transformer language models down to language bigrams, revealing a monotonic evolution of the scaling exponents. Our results also include scaling laws obtained from training on random walks on random graphs drawn from Erdös-Renyi and scale-free Barabási-Albert ensembles. Finally, we revisit conventional scaling laws for language modeling, demonstrating that several essential results can be reproduced using 2 layer transformers with context length of 50, provide a critical analysis of various fits used in prior literature, demonstrate an alternative method for obtaining compute optimal curves as compared with current practice in published literature, and provide preliminary evidence that maximal update parameterization may be more parameter efficient than standard parameterization.
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Kansas > Sheridan County (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
GSDF: 3DGS Meets SDF for Improved Neural Rendering and Reconstruction
Representing 3D scenes from multiview images remains a core challenge in computer vision and graphics, requiring both reliable rendering and reconstruction, which often conflicts due to the mismatched prioritization of image quality over precise underlying scene geometry. Although both neural implicit surfaces and explicit Gaussian primitives have advanced with neural rendering techniques, current methods impose strict constraints on density fields or primitive shapes, which enhances the affinity for geometric reconstruction at the sacrifice of rendering quality. To address this dilemma, we introduce GSDF, a dual-branch architecture combining 3D Gaussian Splatting (3DGS) and neural Signed Distance Fields (SDF). Our approach leverages mutual guidance and joint supervision during the training process to mutually enhance reconstruction and rendering. Specifically, our method guides the Gaussian primitives to locate near potential surfaces and accelerates the SDF convergence. This implicit mutual guidance ensures robustness and accuracy in both synthetic and real-world scenarios. Experimental results demonstrate that our method boosts the SDF optimization process to reconstruct more detailed geometry, while reducing floaters and blurry edge artifacts in rendering by aligning Gaussian primitives with the underlying geometry.