Multi-modal Representation Learning for Social Post Location Inference