nullnull
Scaling Laws for Precision in High-Dimensional Linear Regression
Zhang, Dechen, Tang, Xuan, Liang, Yingyu, Zou, Difan
Low-precision training is critical for optimizing the trade-off between model quality and training costs, necessitating the joint allocation of model size, dataset size, and numerical precision. While empirical scaling laws suggest that quantization impacts effective model and data capacities or acts as an additive error, the theoretical mechanisms governing these effects remain largely unexplored. In this work, we initiate a theoretical study of scaling laws for low-precision training within a high-dimensional sketched linear regression framework. By analyzing multiplicative (signal-dependent) and additive (signal-independent) quantization, we identify a critical dichotomy in their scaling behaviors. Our analysis reveals that while both schemes introduce an additive error and degrade the effective data size, they exhibit distinct effects on effective model size: multiplicative quantization maintains the full-precision model size, whereas additive quantization reduces the effective model size. Numerical experiments validate our theoretical findings. By rigorously characterizing the complex interplay among model scale, dataset size, and quantization error, our work provides a principled theoretical basis for optimizing training protocols under practical hardware constraints.
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Minnesota (0.04)
- (2 more...)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Education > Educational Setting > Online (0.50)
- Government (0.45)
- Europe > France > Bourgogne-Franche-Comté > Doubs > Besançon (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Germany > Berlin (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- Asia > Middle East > Saudi Arabia > Mecca Province > Thuwal (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (0.92)
- Research Report > Experimental Study (0.92)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
f66340d6f28dae6aab0176892c9065e7-Supplemental-Conference.pdf
Once closed-form expressions for these Jacobians are derived, it remains to substitute those expressions into (16). The following identity (often termed the "vec" rule) will To depict the spatial topographies of the latent components measured on the EEG and fMRI analyses, the "forward-model" [ The results of the comparison are shown in Fig S1, where it is clear that the signal fidelity of the GCs (right panel) significantly exceeds those yielded by PCA (left) and ICA (middle). GCA is only able to recover sources with temporal dependencies (i.e., s Both the single electrodes and Granger components exhibit two pronounced peaks in the spectra: one near 2 Hz ("delta" Fig S3 shows the corresponding result for the left motor imagery condition. EEG motor imagery dataset described in the main text. For each technique, the first 6 components are presented.
- South America > Chile > Arica y Parinacota Region > Arica Province > Arica (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Doubly Robust Augmented Transfer for Meta-Reinforcement Learning
RL problems through the idea of "learning to learn". Current meta-RL methods can be classified in to two categories. These methods mainly differ in their ways of inference [3, 4, 20]. The other line follows the technique of relabeling that enables sample reuse across tasks, i.e., learning a task Packer et al. apply hindsight relabeling for meta-RL, and propose hindsight task relabeling (HTR) to relabel the trajectories Taking a step further than hindsight relabelling, Wan et al. introduce additionally foresight Huang et al. derive a general form of policy gradient from DR value estimator [29], whereas a DR off-policy actor-critic Kallus et al. propose the doubly robust method to find a robust policy that can Depending on the knowledge to be transferred, these methods in RL can be roughly divided into classes including sampled transitions [32, 33], learned policies or value networks [34, 35, 36, 37], features [38, 39, 40], and skills [41, 42]. Doubly Robust Property for Direct Use of Doubly Robust Estimator We show the doubly robust property of the DR estimator for value function in Eq. (5) in the main text, as follows.
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.30)