Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter