OptimalBrainCompression: AFrameworkfor AccuratePost-Training QuantizationandPruning

Neural Information Processing Systems 

Our frameworkstarts from thelayer-wise compression problem described above,bywhich theglobal compression task,defined either forpruning orquantization, is first split into layer-wise sub-problems, based on the layer behavior on the calibration data.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found