MultiMoDN--Multimodal, Multi-Task, Interpretable Modular Networks
–Neural Information Processing Systems
Predicting multiple real-world tasks in a single model often requires a particularly diverse feature space. Multimodal (MM) models aim to extract the synergistic predictive potential of multiple data types to create a shared feature space with aligned semantic meaning across inputs of drastically varying sizes (i.e. Most current MM architectures fuse these representations in parallel, which not only limits their interpretability but also creates a dependency on modality availability. MultiModN's composable pipeline is interpretable-by-design, as well as innately multi-task and robust to the fundamental issue of biased missingness. We perform four experiments on several benchmark MM datasets across 10 real-world tasks (predicting medical diagnoses, academic performance, and weather), and show that MultiModN's sequential MM fusion does not compromise performance compared with a baseline of parallel fusion.
Neural Information Processing Systems
Jan-18-2025, 13:08:39 GMT
- Technology: