Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models