Open-vocabulary Pick and Place via Patch-level Semantic Maps