Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey