Goto

Collaborating Authors

 top-1 accuracy


Appendix A Training details

Neural Information Processing Systems

Models are trained with Stochastic Gradient Descent with momentum equal to 0.9 [ We use a learning rate annealing scheme, decreasing the learning rate by a factor of 0.1 every 30 epochs. We train all models for 150 epochs. Then, we select the best learning rate and weight decay for each method and run 5 different seeds to report mean and standard deviation. We use the validation set of ImageNet to perform cross-validation and report performance on it. In section G we train the Augerino method on top of the Resnet-18 architecture.


We provide a simple pseudo-2

Neural Information Processing Systems

We thank all the reviewers for their constructive comments. We will provide details in the final draft. MCUNet shows consistent improvement across different devices (F746, H743) and tasks (classification, detection). R1: Whether the overall network topology brings major improvement. R2: Why the auto-tuning in TVM fails to work on MCUs.



A Appendix A531A.1 Detailed explanation of continuous nature of similarity

Neural Information Processing Systems

In this section, we expand on our observation that similarity between training samples is not binary. Consider the images shown in Figure 6. As a consequence, any similarity between the anchor image and the so-called'negative' examples is completely ignored. Further, all'positive' examples are considered to be The batch size is set to 16000. We train on 4 A100 GPUs.