A Modular End-to-End Multimodal Learning Method for Structured and Unstructured Data