Goto

Collaborating Authors

 egocentric video







A Self Validation Network for Object-Level Human Attention Estimation

Zehua Zhang, Chen Yu, David Crandall

Neural Information Processing Systems

Some recent work [22, 66, 68] has discussed estimating probability maps of ego-attention or predicting gaze points in egocentric videos. However, people think not in terms of points in their field of view, but in terms of theobjects that they are attending to. Of course, the object of interest could be obtained by first estimating the gaze with the gaze estimator and generating object candidates from an off-theshelf object detector, and then picking the object that the estimated gaze falls in. Because this bottom-up approach estimateswhere and what separately, it could be doomed to fail if the eye gaze prediction is slightly inaccurate, such as falling between two objects or in the intersection ofmultiple object bounding boxes (Figure1).




Look Ma, No Hands!

Neural Information Processing Systems

The analysis and use of egocentric videos for robotic tasks is made challenging by occlusion due to the hand and the visual mismatch between the human hand and a robot end-effector. In this sense, the human hand presents a nuisance. However, often hands also provide a valuable signal, e.g. the hand pose may suggest what kind of object is being held.


Ego TaskQA: UnderstandingHumanTasksin EgocentricVideos

Neural Information Processing Systems

These questions are dividedintofourtypes,includingdescriptive(whatstatus?),predictive(whatwill?), explanatory (what caused?), and counterfactual (what if?) to provide diagnostic analyses onspatial, temporal, and causalunderstandings ofgoal-oriented tasks. We show an illustrative scenario where two subjects collaborate to makeanddrinkcereal.