Color segmentation is a challenging subtask in computer vision. Most popular approaches are computationally expensive, involve an extensive off-line training phase and/or rely on a stationary camera. This paper presents an approach for color learning on-board a legged robot with limited computational and memory resources. A key defining feature of the approach is that it works without any labeled training data. Rather, it trains autonomously from a color-coded model of its environment. The process is fully implemented, completely autonomous, and provides high degree of segmentation accuracy.
In the middle-left of the screen is a view of the background image. This is a grayscale image of the scene that was captured, once, when the video first started playing--before the hand entered the frame. The background image, grayBg, stores the first valid frame of video; this is performed in the line grayBg grayImage;. A boolean latch (bLearnBackground) prevents this from happening repeatedly on subsequent frames. However, this latch is reset if the user presses a key. It is absolutely essential that your system "learn the background" when your subject (such as the hand) is out of the frame. Otherwise, your subject will be impossible to detect properly!
The Animate Agent Project at the University of Chicago is an ongoing effort to explore the mechanisms underlying intelligent, goal-directed behavior. Our research strategy is centered on the development of an autonomous robot that performs useful tasks in a real environment, with natural human instruction and feedback, as a way of researching the links between perception, action, and intelligent control. Robust and timely perception is fundamental to the intelligent behavior we are working to achieve. As others have done before us (Bajcsy 1988; Ullman 1984; Chapman 1991; Ballard 1991; Aloimonos 1990), we have observed that a tight link between the perceptual and control systems enables perception to be well tuned to the context: the task, environment, and state of the perceiving agent (or robot in our case). As a result, perception can be more robust and efficient, and in addition these links can provide elegant solutions to issues such as grounding symbols in plans of the control system. Our computer vision research concerns the problems of identifying relevant contextual constraints that can be brought to bear on our more or less traditional computer vision problems, and applying these constraints effectively in a real-time system. We have demonstrated that certain vision problems which have proven difficult or intractable can be solved robustly and efficiently if enough is known about the specific contexts in which they occur (Prokopowicz, Swain, & Kahn 1994; Firby et al. 1995). This context includes what the robot is trying to do, its current state, and its knowledge of what it expects to see in this situation. The difficulty lies not so much identifying the types of knowledge that can be used in different situations, but in applying that knowledge terpretation of images.
Tracking people with integrated stereo, color, and face detection. Abstract We present an approach to robust, real-time person tracking in crowded and/or unknown environments using multimodal integration. We combine stereo, color, and face detection modules into a single robust system, and show an initial application for an interactive display where the user sees his face distorted into various comic poses in real-time. Stereo processing is used to isolate the figure of a user from other objects and people in the background. Skin-hue classification identifies and tracks likely body parts within the foreground region, and face pattern detection discriminates and localizes the face within the tracked body parts. We discuss the failure modes of these individual components, and report results with the complete system in trials with thousands of users. Introduction The creation of displays or environments which passively observe and react to people is an exciting challenge for computer vision.