Set-the-Scene: Global-Local Training for Generating Controllable NeRF Scenes