AITopics | describe image

Teaching Machines to Describe Images with Natural Language Feedback

Neural Information Processing SystemsNov-21-2025, 15:48:42 GMT

Robots will eventually be part of every household. It is thus critical to enable algorithms to learn from and be guided by non-expert users. In this paper, we bring a human in the loop, and enable a human teacher to give feedback to a learning agent in the form of natural language. A descriptive sentence can provide a stronger learning signal than a numeric reward in that it can easily point to where the mistakes are and how to correct them. We focus on the problem of image captioning in which the quality of the output can easily be judged by non-experts. We propose a phrase-based captioning model trained with policy gradients, and design a critic that provides reward to the learner by conditioning on the human-provided feedback. We show that by exploiting descriptive feedback our model learns to perform better than when given independently written human captions.

describe image, natural language feedback, teaching machine, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots (0.61)

Add feedback

Reviews: Teaching Machines to Describe Images with Natural Language Feedback

Neural Information Processing SystemsOct-8-2024, 03:48:46 GMT

The paper presents an approach for automatically captioning images where the model also incorporates natural language feedback from humans along with ground truth captions during training. The proposed approach uses reinforcement learning to train a phrase based captioning model where the model is first trained using maximum likelihood training (supervised learning) and then further finetuned using reinforcement learning where the reward is weighted sum of BLEU scores w.r.t to the ground truth and the feedback sentences provided by humans. The reward also consists of phrase level rewards obtained by using the human feedback. The proposed model is trained and evaluated on MSCOCO image caption data. The proposed model is compared with a pure supervised learning (SL) model, a model trained using reinforcement learning (RL) without any feedback.

information, natural language feedback, teaching machine, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Microsoft says its AI can describe images 'as well as people do'

#artificialintelligenceOct-15-2020, 00:15:14 GMT

Describing an image accurately, and not just like a clueless robot, has long been the goal of AI. In 2016, Google said its artificial intelligence could caption images almost as well as humans, with 94 percent accuracy. Now Microsoft says it's gone even further: Its researchers have built an AI system that's even more accurate than humans -- so much so that it now sits at the top of the leaderboard for the nocaps image captioning benchmark. Microsoft claims its two times better than the image captioning model it's been using since 2015. It's now offering the new captioning model as part of Azure's Cognitive Services, so any developer can bring it into their apps.

artificial intelligence, describe image, microsoft, (6 more...)

#artificialintelligence

Industry: Information Technology > Services (0.40)

Technology: Information Technology > Artificial Intelligence > Vision (0.61)

Add feedback

Teaching Machines to Describe Images with Natural Language Feedback

ling, huan, Fidler, Sanja

Neural Information Processing SystemsFeb-14-2020, 16:45:02 GMT

Robots will eventually be part of every household. It is thus critical to enable algorithms to learn from and be guided by non-expert users. In this paper, we bring a human in the loop, and enable a human teacher to give feedback to a learning agent in the form of natural language. A descriptive sentence can provide a stronger learning signal than a numeric reward in that it can easily point to where the mistakes are and how to correct them. We focus on the problem of image captioning in which the quality of the output can easily be judged by non-experts.

describe image, natural language feedback, teaching machine

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.66)
Information Technology > Artificial Intelligence > Robots (0.65)

Add feedback

Chrome will use AI to describe images for blind and low-vision users

#artificialintelligenceOct-10-2019, 16:03:19 GMT

The internet can be a difficult place to navigate for people who are blind or who have low vision. A large portion of content on the internet is visual, and unless website creators use alt text to label their images, it's hard for users of screen readers or Braille displays to know what they show. To address the issue, Google has announced a new feature for Chrome which will use machine learning to recognize images and offer text descriptions of what they show. It is based on the same technology which lets users search for images by keyword, and the description of the image is auto-generated. "The unfortunate state right now is that there are still millions and millions of unlabeled images across the web," said Laura Allen, a senior program manager on the Chrome accessibility team.

chrome, image description, screen reader, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Microsoft's AI will describe images in Word and PowerPoint for blind users

#artificialintelligenceDec-3-2016, 15:00:10 GMT

Artificial intelligence may be making small and steady advances in general-purpose situations like digital assistants. But it's the more subtle AI accessibility features that have a more substantial impact today, especially for users with disabilities. For instance, an upcoming feature for Office apps like Microsoft Word and PowerPoint will automatically suggest image and slide deck captions, called alt-text, using AI algorithms. That way, when those files are presented to blind users, computer tools designed to translate the information onscreen into audio have text descriptions to work with. Microsoft is accomplishing this feat with its Computer Vision Cognitive Service, which uses neural networks trained with deep learning techniques to better understand and describe the contents of images.

artificial intelligence, machine learning, word and powerpoint, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Google's 'Show and Tell' AI can tell you what's in a photo with nearly 94% accuracy

Daily Mail - Science & techSep-23-2016, 20:40:14 GMT

Artificial intelligence systems have recently begun to try their hand at writing picture captions, often producing hilarious, and even offensive, blunders. But, Google's Show and Tell algorithm has almost perfected the craft. According to the firm, the AI can now describe images with nearly 94 percent accuracy and may even'understand' the context and deeper meaning of a scene. According to the firm, Google's AI can now describe images with nearly 94 percent accuracy and may even'understand' the context and deeper meaning of a scene. Google has released the open-source code for its image captioning system, allowing developers to take part, the firm revealed on its research blog.

artificial intelligence, caption, machine learning, (17 more...)

Daily Mail - Science & tech

Genre: Research Report > New Finding (0.33)

Technology:

Information Technology > Artificial Intelligence > Vision (0.55)
Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

Teaching Computers to Describe Images as People Would

#artificialintelligenceApr-16-2016, 05:24:49 GMT

Let's say you're scrolling through your favorite social media app and you come across a series of pictures of a man in a tuxedo and a woman in a long white dress. An automated image captioning system might describe that scene as "a picture of a man and a woman," or maybe even "a bride and a groom." But a person might look at the pictures and think, "Wow, my friends got married! As image captioning tools get increasingly good at correctly recognizing the objects in an image, a group of researchers is taking the technology one step further. They are working on a system that can automatically describe a series of images in the same kind of way that a human would, by focusing not just on the items in the picture but also what's happening and how it might make a person feel. "Captioning is about taking concrete objects and putting them together in a literal description," said Margaret Mitchell, a Microsoft researcher who is leading the research project. "What I've been calling visual ...

machine learning, natural language, visual storytelling system, (11 more...)

#artificialintelligence

Industry: Education > Curriculum > Subject-Specific Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.99)

Add feedback

Teaching computers to describe images as people would - Next at Microsoft

#artificialintelligenceApr-14-2016, 16:56:07 GMT

Let's say you're scrolling through your favorite social media app and you come across a series of pictures of a man in a tuxedo and a woman in a long white dress. An automated image captioning system might describe that scene as "a picture of a man and a woman," or maybe even "a bride and a groom." But a person might look at the pictures and think, "Wow, my friends got married! As image captioning tools get increasingly good at correctly recognizing the objects in an image, a group of researchers is taking the technology one step further. They are working on a system that can automatically describe a series of images in the same kind of way that a human would, by focusing not just on the items in the picture but also what's happening and how it might make a person feel. "Captioning is about taking concrete objects and putting them together in a literal description," said Margaret Mitchell, a Microsoft researcher who is leading the research project. "What I've been calling visual ...

machine learning, natural language, visual storytelling system, (13 more...)

#artificialintelligence

Country: Europe > Sweden > Skåne County > Malmö (0.05)

Industry: Education > Curriculum > Subject-Specific Education (0.40)

Technology: