Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation
Bloodgood, Michael, Callison-Burch, Chris
We explore how to improve machine translation systems by adding more translation data in situations where we already have substantial resources. The main challenge is how to buck the trend of diminishing returns that is commonly encountered. We present an active learning-style data solicitation algorithm to meet this challenge. We test it, gathering annotations via Amazon Mechanical Turk, and find that we get an order of magnitude increase in performance rates of improvement.
Oct-21-2014
- Country:
- Asia
- Europe
- France > Occitanie
- Haute-Garonne > Toulouse (0.04)
- Greece > Attica
- Athens (0.04)
- Italy > Trentino-Alto Adige/Südtirol
- Trentino Province > Trento (0.04)
- Middle East > Malta
- Port Region > Southern Harbour District > Valletta (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- France > Occitanie
- North America > United States
- New York > New York County
- New York City (0.14)
- Colorado > Boulder County
- Boulder (0.04)
- California
- Los Angeles County > Los Angeles (0.14)
- San Francisco County > San Francisco (0.14)
- New Jersey > Somerset County
- Somerset (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Maryland > Baltimore (0.04)
- Ohio > Franklin County
- Columbus (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- New York > New York County
- Genre:
- Research Report (0.82)
- Technology: