Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems 

The authors propose a vision method to estimate where within an image a pictured head is looking. Given the image and the cropped out head, the system returns a saliency map consisting of confidence ratings on grid cells saying how likely that position is to be the subject of that head (person)'s gaze. The technique uses CNN with two pathways, one for the head/gaze and one for the full image/saliency of the scene. This dataset contains in total 36K people. The method is compared to a few reasonable baselines that represent alternative approaches one might implement and sanity checks.