Goto

Collaborating Authors

 visual speechreading


Dynamic Features for Visual Speechreading: A Systematic Comparison

Neural Information Processing Systems

Humans use visual as well as auditory speech signals to recognize spoken words. A variety of systems have been investigated for per(cid:173) forming this task. The main purpose of this research was to sys(cid:173) tematically compare the performance of a range of dynamic visual features on a speechreading task. We have found that normal(cid:173) ization of images to eliminate variation due to translation, scale, and planar rotation yielded substantial improvements in general(cid:173) ization performance regardless of the visual representation used. In addition, the dynamic information in the difference between suc(cid:173) cessive frames yielded better performance than optical-flow based approaches, and compression by local low-pass filtering worked sur(cid:173) prisingly better than global principal components analysis (PCA).


Dynamic Features for Visual Speechreading: A Systematic Comparison

Neural Information Processing Systems

Humans use visual as well as auditory speech signals to recognize spoken words. A variety of systems have been investigated for performing this task. The main purpose of this research was to systematically compare the performance of a range of dynamic visual features on a speechreading task. We have found that normalization of images to eliminate variation due to translation, scale, and planar rotation yielded substantial improvements in generalization performance regardless of the visual representation used. In addition, the dynamic information in the difference between successive frames yielded better performance than optical-flow based approaches, and compression by local low-pass filtering worked surprisingly better than global principal components analysis (PCA). These results are examined and possible explanations are explored.


Dynamic Features for Visual Speechreading: A Systematic Comparison

Neural Information Processing Systems

Humans use visual as well as auditory speech signals to recognize spoken words. A variety of systems have been investigated for performing this task. The main purpose of this research was to systematically compare the performance of a range of dynamic visual features on a speechreading task. We have found that normalization of images to eliminate variation due to translation, scale, and planar rotation yielded substantial improvements in generalization performance regardless of the visual representation used. In addition, the dynamic information in the difference between successive frames yielded better performance than optical-flow based approaches, and compression by local low-pass filtering worked surprisingly better than global principal components analysis (PCA). These results are examined and possible explanations are explored.


Dynamic Features for Visual Speechreading: A Systematic Comparison

Neural Information Processing Systems

Humans use visual as well as auditory speech signals to recognize spoken words. A variety of systems have been investigated for performing thistask. The main purpose of this research was to systematically comparethe performance of a range of dynamic visual features on a speechreading task. We have found that normalization ofimages to eliminate variation due to translation, scale, and planar rotation yielded substantial improvements in generalization performanceregardless of the visual representation used. In addition, the dynamic information in the difference between successive framesyielded better performance than optical-flow based approaches, and compression by local low-pass filtering worked surprisingly betterthan global principal components analysis (PCA). These results are examined and possible explanations are explored.