Steerable Scene Generation with Post Training and Inference-Time Search