Ovis: Structural Embedding Alignment for Multimodal Large Language Model