GeoGrid-Bench: Can Foundation Models Understand Multimodal Gridded Geo-Spatial Data?