HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding