Towards Interpretable Visuo-Tactile Predictive Models for Soft Robot Interactions