Goto

Collaborating Authors

 Kanade, Takeo


Emotional Expression Classification using Time-Series Kernels

arXiv.org Machine Learning

Estimation of facial expressions, as spatio-temporal processes, can take advantage of kernel methods if one considers facial landmark positions and their motion in 3D space. We applied support vector classification with kernels derived from dynamic time-warping similarity measures. We achieved over 99% accuracy - measured by area under ROC curve - using only the 'motion pattern' of the PCA compressed representation of the marker point vector, the so-called shape parameters. Beyond the classification of full motion patterns, several expressions were recognized with over 90% accuracy in as few as 5-6 frames from their onset, about 200 milliseconds.


Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces

Neural Information Processing Systems

There has been a recent push in extraction of 3D spatial layout of scenes. However, none of these approaches model the 3D interaction between objects and the spatial layout. In this paper, we argue for a parametric representation of objects in 3D, which allows us to incorporate volumetric constraints of the physical world. We show that augmenting current structured prediction techniques with volumetric reasoning significantly improves the performance of the state-of-the-art.


Nonrigid Structure from Motion in Trajectory Space

Neural Information Processing Systems

Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes, which have to be estimated anew for each video sequence. In contrast, we propose that the evolving 3D structure be described by a linear combination of basis trajectories. The principal advantage of this lateral approach is that we do not need to estimate any basis vectors during computation. Instead, we show that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) bases, can be used to effectively describe most real motions. This results in a significant reduction in unknowns, and corresponding stability, in estimation. We report empirical performance, quantitatively using motion capture data and qualitatively on several video sequences exhibiting nonrigid motions including piece-wise rigid motion, articulated motion, partially nonrigid motion (such as a facial expression), and highly nonrigid motion (such as a person dancing).


Human Face Detection in Visual Scenes

Neural Information Processing Systems

We present a neural network-based face detection system. A retinally connected neural network examines small windows of an image, and decides whether each window contains a face. The system arbitrates between multiple networks to improve performance over a single network. We use a bootstrap algorithm for training, which adds false detections into the training set as training progresses. This eliminates the difficult task of manually selecting non-face training examples, which must be chosen to span the entire space of non-face images.


Human Face Detection in Visual Scenes

Neural Information Processing Systems

We present a neural network-based face detection system. A retinally connected neural network examines small windows of an image, and decides whether each window contains a face. The system arbitrates between multiple networks to improve performance over a single network. We use a bootstrap algorithm for training, which adds false detections into the training set as training progresses. This eliminates the difficult task of manually selecting non-face training examples, which must be chosen to span the entire space of non-face images.


Human Face Detection in Visual Scenes

Neural Information Processing Systems

We present a neural network-based face detection system. A retinally connected neural network examines small windows of an image, and decides whether each window contains a face. The system arbitrates between multiple networks to improve performance over a single network. We use a bootstrap algorithm for training, which adds false detections into the training set as training progresses. This eliminates the difficult task of manually selecting non-face training examples, which must be chosen to span the entire space of non-face images.


A Framework for Representing and Reasoning about Three-Dimensional Objects for Visione

AI Magazine

The capabilities for representing and reasoning about three-dimensional (3-D) objects are essential for knowledge-based, 3-D photointerpretation systems that combine domain knowledge with image processing, as demonstrated by 3- D Mosaic and ACRONYM. A practical framework for geometric representation and reasoning must incorporate projections between a two-dimensional (2-D) image and a 3-D scene, shape and surface properties of objects, and geometric and topological relationships between objects. In addition, it should allow easy modification and extension of the system's domain knowledge and be flexible enough to organize its reasoning efficiently to take advantage of the current available knowledge. This system uses frames to represent objects such as buildings and walls, geometric features such as lines and planes, and geometric relationships such as parallel lines.


A Framework for Representing and Reasoning about Three-Dimensional Objects for Visione

AI Magazine

The capabilities for representing and reasoning about three-dimensional (3-D) objects are essential for knowledge-based, 3-D photointerpretation systems that combine domain knowledge with image processing, as demonstrated by 3- D Mosaic and ACRONYM. Three-dimensional representation of objects is necessary for many additional applications, such as robot navigation and 3-D change detection. Geometric reasoning is especially important because geometric relationships between object parts are a rich source of domain knowledge. A practical framework for geometric representation and reasoning must incorporate projections between a two-dimensional (2-D) image and a 3-D scene, shape and surface properties of objects, and geometric and topological relationships between objects. In addition, it should allow easy modification and extension of the system's domain knowledge and be flexible enough to organize its reasoning efficiently to take advantage of the current available knowledge. We are developing such a framework -- the Frame-based Object Recognition and Modeling (3-D FORM) System. This system uses frames to represent objects such as buildings and walls, geometric features such as lines and planes, and geometric relationships such as parallel lines. Active procedures attached to the frames dynamically compute values as needed. Because the order of processing is controlled largely by the order of slot access, the system performs both top-down and bottom-up reasoning, depending on the current available knowledge. The FORM system is being implemented with the Carnegie-Mellon University-built Framekit tool in Common Lisp (Carbonell and Joseph 1986). To date, it has been applied to two types of geometric reasoning problems: interpreting 3-D wire frame data and solving sets of geometric constraints.