Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT