Goto

Collaborating Authors

 shwartz


Do Vision-Language Models Understand Compound Nouns?

Kumar, Sonal, Ghosh, Sreyan, Sakshi, S, Tyagi, Utkarsh, Manocha, Dinesh

arXiv.org Artificial Intelligence

Open-vocabulary vision-language models (VLMs) like CLIP, trained using contrastive loss, have emerged as a promising new paradigm for text-to-image retrieval. However, do VLMs understand compound nouns (CNs) (e.g., lab coat) as well as they understand nouns (e.g., lab)? We curate Compun, a novel benchmark with 400 unique and commonly used CNs, to evaluate the effectiveness of VLMs in interpreting CNs. The Compun benchmark challenges a VLM for text-to-image retrieval where, given a text prompt with a CN, the task is to select the correct image that shows the CN among a pair of distractor images that show the constituent nouns that make up the CN. Next, we perform an in-depth analysis to highlight CLIPs' limited understanding of certain types of CNs. Finally, we present an alternative framework that moves beyond hand-written templates for text prompts widely used by CLIP-like models. We employ a Large Language Model to generate multiple diverse captions that include the CN as an object in the scene described by the caption. Our proposed method improves CN understanding of CLIP by 8.25% on Compun. Code and benchmark are available at: https://github.com/sonalkum/Compun


You Can Now Live Forever. (Your AI-Powered Twin, That Is).

#artificialintelligence

It's January 17, 2020-- the world has yet to change; Wuhan locks down six days later -- and Emil Jimenez is on a train from Vienna to Prague. "She's like, 'Daddy,' y'know, 'what is this?'" Jimenez tells me on a video call from the Czech Republic. Jimenez tells her it's Siri, and encourages her to talk to the digital assistant. Her first question is if Siri has a mother. From there, she peppers the artificial intelligence with the kinds of questions kids ask -- do you like ice cream?


Experts: AI Needs Ethics – Hypergrid Business

#artificialintelligence

Artificial intelligence is increasingly becoming a part of our daily lives, both in the workplace and at home. Some AI experts are stressing the need to focus on making AI ethical and keeping it human friendly. Bias in programming, security concerns, and a lack of public knowledge about how AI works are all issues that need to be addressed to develop and maintain a healthy relationship between humans and the technology we use. "This is the year AI ethics become absolutely mandatory functions in most businesses, not just talk," Alex Spinelli, chief technology officer at LivePerson and former global head of Alexa OS for Amazon, told Hypergrid Business. Companies are just starting to consider responsible use of AI as a part of their business model.


Needed: People To Put The Intelligence In Artificial Intelligence

#artificialintelligence

Is the digital workforce ready to take over? Artificial intelligence may be capable of assuming many tasks, but it will be some time, if ever, that it could replace jobs on a widespread basis. It simply has too many limitations. Instead, we need to acquaint a generation of workers with technologies to take on the more mundane, repetitive portions of their jobs, and in turn elevate their decision-making roles within enterprises. That's the word from Steve Shwartz, AI author, researcher and investor, who points out that the notion of AI taking jobs is a myth.


Learning Taxonomies of Concepts and not Words using Contextualized Word Representations: A Position Paper

Schmelzeisen, Lukas, Staab, Steffen

arXiv.org Machine Learning

Taxonomies are semantic hierarchies of concepts. One limitation of current taxonomy learning systems is that they define concepts as single words. This position paper argues that contextualized word representations, which recently achieved state-of-the-art results on many competitive NLP tasks, are a promising method to address this limitation. We outline a novel approach for taxonomy learning that (1) defines concepts as synsets, (2) learns density-based approximations of contextualized word representations, and (3) can measure similarity and hypernymy among them.