CAVACHON: a hierarchical variational autoencoder to integrate multi-modal single-cell data