Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agent