Adam-family Methods with Decoupled Weight Decay in Deep Learning