Significant advances have been made during the past few years in the ability of artificial intelligence (AI) systems to recognize and analyze human emotion and sentiment, owing in large part to accelerated access to data (primarily social media feeds and digital video), cheaper compute power, and evolving deep learning capabilities combined with natural language processing (NLP) and computer vision. According to a new report from Tractica, these trends are beginning to drive growth in the market for sentiment and emotion analysis software. Tractica forecasts that worldwide revenue from sentiment and emotion analysis software will increase from $123 million in 2017 to $3.8 billion by 2025. The market intelligence firm anticipates that this growth will be driven by several key industries including retail, advertising, business services, healthcare, and gaming. According to Tractica's analysis, the top use case categories for sentiment and emotion analysis will be as follows: "A better understanding of human emotion will help AI technology create more empathetic customer and healthcare experiences, drive our cars, enhance teaching methods, and figure out ways to build better products that meet our needs," says principal analyst Mark Beccue.
Most of the systems on the market will clock anywhere around 55-65% for unseen data, even though they might be 85% accurate in their cross-validations. At this juncture, it's important to realize that sentiment analysis is critical for any system monitoring customer reviews or social media posts. Hardly had the business world caught up with a sentence level sentiment analysis, we are now moving to aspect level sentiment analysis - more directed & granular, adding to the complexity. The question is this - can we do something to augment our sentiment analysis? For the past few months, I have been using context and relationship extraction to augment sentiment analysis.
We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as "self-reported emotions." We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model's results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks.