cobe
COBE: Contextualized Object Embeddings from Narrated Instructional Video
Many objects in the real world undergo dramatic variations in visual appearance. For example, a tomato may be red or green, sliced or chopped, fresh or fried, liquid or solid. Training a single detector to accurately recognize tomatoes in all these different states is challenging. On the other hand, contextual cues (e.g., the presence of a knife, a cutting board, a strainer or a pan) are often strongly indicative of how the object appears in the scene. Recognizing such contextual cues is useful not only to improve the accuracy of object detection or to determine the state of the object, but also to understand its functional properties and to infer ongoing or upcoming human-object interactions.
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Review for NeurIPS paper: COBE: Contextualized Object Embeddings from Narrated Instructional Video
While this algorithm is specifically designed for detectors, Miech et al 2019 used unsupervised NCE losses (much like the ones in this paper) in order to understand the natural language descriptions associated with videos; the algorithm presented here seems like the most straightforward extension of this idea to bounding boxes. Little attention is given to demonstrating that the use of bounding boxes fundamentally changes the problem. Update The rebuttal addresses the following point regarding the accuracy of the evaluation. I had misunderstood the annotations that are available with epic kitchens, and therefore I am changing my review. I would encourage the authors to clarify the writing regarding what's available with epic kitchens.
- Education > Educational Technology > Media (0.40)
- Education > Educational Technology > Audio & Video (0.40)
COBE: Contextualized Object Embeddings from Narrated Instructional Video
Many objects in the real world undergo dramatic variations in visual appearance. For example, a tomato may be red or green, sliced or chopped, fresh or fried, liquid or solid. Training a single detector to accurately recognize tomatoes in all these different states is challenging. On the other hand, contextual cues (e.g., the presence of a knife, a cutting board, a strainer or a pan) are often strongly indicative of how the object appears in the scene. Recognizing such contextual cues is useful not only to improve the accuracy of object detection or to determine the state of the object, but also to understand its functional properties and to infer ongoing or upcoming human-object interactions.
- Education > Educational Technology > Media (0.44)
- Education > Educational Technology > Audio & Video (0.44)
Mere Contrastive Learning for Cross-Domain Sentiment Analysis
Luo, Yun, Guo, Fang, Liu, Zihan, Zhang, Yue
Cross-domain sentiment analysis aims to predict the sentiment of texts in the target domain using the model trained on the source domain to cope with the scarcity of labeled data. Previous studies are mostly cross-entropy-based methods for the task, which suffer from instability and poor generalization. In this paper, we explore contrastive learning on the crossdomain sentiment analysis task. We propose a modified contrastive objective with in-batch negative samples so that the sentence representations from the same class will be pushed close while those from the different classes become further apart in the latent space. Experiments on two widely used datasets show that our model can achieve state-of-the-art performance in both cross-domain and multi-domain sentiment analysis tasks. Meanwhile, visualizations demonstrate the effectiveness of transferring knowledge learned in the source domain to the target domain and the adversarial Figure 1: The architectures for cross-entropy-based test verifies the robustness of our model.
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Intelligent Technology Can Give Ethical Guidance
Hey Robot, is AI a good thing? Accenture employees can now anonymously ask a new internal chatbot questions on the ethical guidelines for deploying a client's artificial intelligence programs. Called COBE, Accenture's chatbot can also address the proper use of social media or employee interactions in the workplace, the global consulting firm announced Dec. 20. Users can interact with COBE--an acronym for Code of Business Ethics, from which it was transformed--via instant message. Accenture designed COBE as more employees and clients encounter a host of ethical questions amid their increasing work on artificial intelligence, autonomous vehicles, and other new technologies.