Pretrain Soft Q-Learning with Imperfect Demonstrations