SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining