This past May I worked with the Internet Archive's Television News Archive to apply Google's suite of cloud AI APIs to analyze a week of television news coverage to examine how AI "sees" television and what insights we might gain into the world of non-consumptive deep learning-powered video understanding. Using Google's video, image, speech and natural language APIs as lenses, more than 600GB of machine annotations trace how deep learning algorithms today understand video. What lessons can we learn about the state of AI today and how it can be applied in creative ways to catalog and explore the vast world of video? Working with the Internet Archive's Television News Archive, a week of television news was selected covering CNN, MSNBC and Fox News and the morning and evening broadcasts of San Francisco affiliates KGO (ABC), KPIX (CBS), KNTV (NBC) and KQED (PBS) from April 15 to April 22, 2019, totaling 812 hours of television news. This week was selected due to it having two major stories, one national (the Mueller report release on April 18th) and one international (the Notre Dame fire on April 15th).
This blog is syndicated from The New Rules of Privacy: Building Loyalty with Connected Consumers in the Age of Face Recognition and AI. To learn more click here. Since the invention of face recognition in the 1960s, has any single technology sparked more fascination for public safety officials, companies, journalists and Hollywood? When people learn that I'm the CEO of a face recognition company, they commonly reference its fictional use in shows like CSI, Black Mirror or even films such as the 1980s James Bond movie A View to a Kill. Most often, however, they mention Minority Report starring Tom Cruise.
Automatic understanding of human affect using visual signals is a problem that has attracted significant interest over the past 20 years. However, human emotional states are quite complex. To appraise such states displayed in real-world settings, we need expressive emotional descriptors that are capable of capturing and describing this complexity. The circumplex model of affect, which is described in terms of valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the activation of the emotion), can be used for this purpose. Recent progress in the emotion recognition domain has been achieved through the development of deep neural architectures and the availability of very large training databases. To this end, Aff-Wild has been the first large-scale "in-the-wild" database, containing around 1,200,000 frames. In this paper, we build upon this database, extending it with 260 more subjects and 1,413,000 new video frames. We call the union of Aff-Wild with the additional data, Aff-Wild2. The videos are downloaded from Youtube and have large variations in pose, age, illumination conditions, ethnicity and profession. Both database-specific as well as cross-database experiments are performed in this paper, by utilizing the Aff-Wild2, along with the RECOLA database. The developed deep neural architectures are based on the joint training of state-of-the-art convolutional and recurrent neural networks with attention mechanism; thus exploiting both the invariant properties of convolutional features, while modeling temporal dynamics that arise in human behaviour via the recurrent layers. The obtained results show premise for utilization of the extended Aff-Wild, as well as of the developed deep neural architectures for visual analysis of human behaviour in terms of continuous emotion dimensions.
DIS is known for its box office hits: Beauty and the Beast, Rogue One: A Star Wars Story, and Captain America: Civil War, just to name a few. As one of the biggest media conglomerates in the world, Disney is looking to better understand its moviegoing audience so that its upcoming movie line-up can continue to be moneymakers and crowd pleasers. Disney hopes to do this through artificial intelligence (AI) and facial recognition technology, using deep learning techniques to track the facial expressions of an audience watching a movie in order to gauge any emotional reaction to it. Called "factorized variational autoencoders," or FVAEs, the researchers said the technology works so well that after observing an audience member's face for just 10 minutes, it can predict how the person will react to the rest of the movie. The FVAEs go on to then recognize many facial expressions from movie viewers on their own, like smiles and laughter, and can make connections between different viewers to see if a particular movie is getting a wanted reaction at the right place and time.
Tech and entertainment companies are betting big on facial recognition technology and Disney wants to be the cool kid on the block. SEE ALSO: Disney unveils'Star Wars Land' and it is everything fans dreamed of The company's research team is using deep learning techniques to track the facial expressions of an audience watching movies in order to asses their emotional reactions to it. Called "factorised variational autoencoders" (FVAEs), the new algorithm is so sharp that is reportedly able to predict how a member of the audience will react to the rest of a film after analysing their facial expressions for just 10 minutes. In a more sophisticated version to recommendation systems for online shopping used by Amazon -- which suggests new products based on your shopping history -- the FVAEs recognise a series of facial expressions from the audience, such as smiles and laughter. Then, they make connections between viewers to see if a certain movie is getting the wanted reactions at the right place and time.