Review for NeurIPS paper: FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
–Neural Information Processing Systems
The PFQ algorithm introduced many hyperparameters, and I am curious how the authors chose the parameters \epsilon and \alpha. The authors simply claimed these parameters are determined from the four-stage manual PFQ from Figure 1, and then claim that FracTrain is insensitive to hyperparameters. First, the precision choices of the four stage PFQ in Figure 1 is already arbitrary. Second, I do not think the empirical results can support the claim that FracTrain is insensitive to hyperparameters. I would encourage the authors to have an ablation study of \epsilon and \alpha.
Neural Information Processing Systems
Jan-26-2025, 12:20:42 GMT
- Technology: