Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment