Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding
Suzgun, Mirac, Melas-Kyriazi, Luke, Jurafsky, Dan
–arXiv.org Artificial Intelligence
In open-ended natural-language generation, existing text decoding methods typically struggle to produce text which is both diverse and high-quality. Greedy and beam search are known to suffer from text degeneration and linguistic diversity issues, while temperature, top-k, and nucleus sampling often yield diverse but low-quality outputs. In this work, we present crowd sampling, a family of decoding methods based on Bayesian risk minimization, to address this diversity-quality trade-off. Inspired by the principle of "the wisdom of the crowd," crowd sampling seeks to select a candidate from a pool of candidates that has the least expected risk (i.e., highest expected reward) under a generative model according to a given utility function. Crowd sampling can be seen as a generalization of numerous existing methods, including majority voting, and in practice, it can be used as a drop-in replacement for existing sampling methods. Extensive experiments show that crowd sampling delivers improvements of 3-7 ROUGE and BLEU points across a wide range of tasks, including summarization, data-to-text, translation, and textual style transfer, while achieving new state-of-the-art results on WebNLG and WMT'16.
arXiv.org Artificial Intelligence
Nov-14-2022
- Country:
- Africa > Nigeria (0.04)
- Asia
- China (0.04)
- India > Maharashtra
- Mumbai (0.04)
- Indonesia > Sumatra
- Aceh > Banda Aceh (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Nepal > Bagmati Province
- Kathmandu District > Kathmandu (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Atlantic Ocean > Mediterranean Sea
- Black Sea (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- United Kingdom
- England
- Greater London > London (0.04)
- Hertfordshire (0.04)
- Leicestershire (0.04)
- Oxfordshire > Oxford (0.04)
- Scotland (0.04)
- England
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Norway (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Spain
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Valencian Community > Valencia Province
- Valencia (0.04)
- Catalonia > Barcelona Province
- Germany > Berlin (0.04)
- France > Grand Est
- Meurthe-et-Moselle > Nancy (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada > British Columbia
- United States
- West Virginia > Wood County
- Parkersburg (0.04)
- Wisconsin > Outagamie County
- Appleton (0.14)
- New York > Monroe County
- Rochester (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Pennsylvania > Indiana County (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Indiana > Fountain County (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Texas > Travis County
- Austin (0.04)
- West Virginia > Wood County
- Oceania > Australia
- New South Wales > Sydney (0.04)
- South America
- Genre:
- Personal > Obituary (0.46)
- Research Report > New Finding (0.68)
- Industry:
- Consumer Products & Services (0.68)
- Government > Regional Government
- Asia Government (0.46)
- Health & Medicine > Therapeutic Area
- Immunology (0.67)
- Infections and Infectious Diseases (0.46)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Leisure & Entertainment > Sports
- Soccer (0.46)
- Technology: