Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head