Towards Learning Cross-Modal Perception-Trace Models