From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects