LearningState-AwareVisualRepresentationsfrom AudibleInteractions