Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation
–Neural Information Processing Systems
Image captioning aims to describe visual content in natural language. As'a picture is worth a thousand words', there could be various correct descriptions for an image. However, with maximum likelihood estimation as the training objective, the captioning model is penalized whenever its prediction mismatches with the label.
Neural Information Processing Systems
Oct-9-2025, 12:25:21 GMT
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- France > Hauts-de-France
- Italy > Tuscany
- Florence (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- Belgium > Brussels-Capital Region
- North America
- Canada > Quebec
- Montreal (0.04)
- Dominican Republic (0.04)
- Puerto Rico > San Juan
- San Juan (0.04)
- United States
- California > Los Angeles County
- Long Beach (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Maryland > Baltimore (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- California > Los Angeles County
- Canada > Quebec
- Oceania > Australia
- Africa > Ethiopia
- Genre:
- Overview (0.67)