Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression