Modular Deep Learning Framework for Assistive Perception: Gaze, Affect, and Speaker Identification