Memory based fusion for multi-modal deep learning