Learning the joint distribution of two sequences using little or no paired data