Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data