This past May I worked with the Internet Archive's Television News Archive to apply Google's suite of cloud AI APIs to analyze a week of television news coverage to examine how AI "sees" television and what insights we might gain into the world of non-consumptive deep learning-powered video understanding. Using Google's video, image, speech and natural language APIs as lenses, more than 600GB of machine annotations trace how deep learning algorithms today understand video. What lessons can we learn about the state of AI today and how it can be applied in creative ways to catalog and explore the vast world of video? Working with the Internet Archive's Television News Archive, a week of television news was selected covering CNN, MSNBC and Fox News and the morning and evening broadcasts of San Francisco affiliates KGO (ABC), KPIX (CBS), KNTV (NBC) and KQED (PBS) from April 15 to April 22, 2019, totaling 812 hours of television news. This week was selected due to it having two major stories, one national (the Mueller report release on April 18th) and one international (the Notre Dame fire on April 15th).
Automatic understanding of human affect using visual signals is a problem that has attracted significant interest over the past 20 years. However, human emotional states are quite complex. To appraise such states displayed in real-world settings, we need expressive emotional descriptors that are capable of capturing and describing this complexity. The circumplex model of affect, which is described in terms of valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the activation of the emotion), can be used for this purpose. Recent progress in the emotion recognition domain has been achieved through the development of deep neural architectures and the availability of very large training databases. To this end, Aff-Wild has been the first large-scale "in-the-wild" database, containing around 1,200,000 frames. In this paper, we build upon this database, extending it with 260 more subjects and 1,413,000 new video frames. We call the union of Aff-Wild with the additional data, Aff-Wild2. The videos are downloaded from Youtube and have large variations in pose, age, illumination conditions, ethnicity and profession. Both database-specific as well as cross-database experiments are performed in this paper, by utilizing the Aff-Wild2, along with the RECOLA database. The developed deep neural architectures are based on the joint training of state-of-the-art convolutional and recurrent neural networks with attention mechanism; thus exploiting both the invariant properties of convolutional features, while modeling temporal dynamics that arise in human behaviour via the recurrent layers. The obtained results show premise for utilization of the extended Aff-Wild, as well as of the developed deep neural architectures for visual analysis of human behaviour in terms of continuous emotion dimensions.
DIS is known for its box office hits: Beauty and the Beast, Rogue One: A Star Wars Story, and Captain America: Civil War, just to name a few. As one of the biggest media conglomerates in the world, Disney is looking to better understand its moviegoing audience so that its upcoming movie line-up can continue to be moneymakers and crowd pleasers. Disney hopes to do this through artificial intelligence (AI) and facial recognition technology, using deep learning techniques to track the facial expressions of an audience watching a movie in order to gauge any emotional reaction to it. Called "factorized variational autoencoders," or FVAEs, the researchers said the technology works so well that after observing an audience member's face for just 10 minutes, it can predict how the person will react to the rest of the movie. The FVAEs go on to then recognize many facial expressions from movie viewers on their own, like smiles and laughter, and can make connections between different viewers to see if a particular movie is getting a wanted reaction at the right place and time.
Deep learning is increasingly capable of assessing the emotion of human faces, looking across an image to estimate how happy or sad the people in it appear to be. What if this could be applied to television news, estimating the average emotion of all of the human faces seen on the news over the course of a week? While AI-based facial sentiment assessment is still very much an active area of research, an experiment using Google's cloud AI to analyze a week's worth of television news coverage from the Internet Archive's Television News Archive demonstrates that even within the limitations of today's tools, there is a lot of visual sentiment in television news. To better understand the facial emotion of television, CNN, MSNBC and Fox News and the morning and evening broadcasts of San Francisco affiliates KGO (ABC), KPIX (CBS), KNTV (NBC) and KQED (PBS) from April 15 to April 22, 2019, totaling 812 hours of television news, were analyzed using Google's Vision AI image understanding API with all of its features enabled, including facial detection. Facial detection is very different from facial recognition.
In today's blog post you are going to learn how to perform face recognition in both images and video streams using: As we'll see, the deep learning-based facial embeddings we'll be using here today are both (1) highly accurate and (2) capable of being executed in real-time. To learn more about face recognition with OpenCV, Python, and deep learning, just keep reading! Inside this tutorial, you will learn how to perform facial recognition using OpenCV, Python, and deep learning. We'll start with a brief discussion of how deep learning-based facial recognition works, including the concept of "deep metric learning". From there, I will help you install the libraries you need to actually perform face recognition. Finally, we'll implement face recognition for both still images and video streams.