Practical Cross-modal Manifold Alignment for Grounded Language