Revealing Trends in Datasets from the 2022 ACL and EMNLP Conferences
Atuhurra, Jesse, Kamigaito, Hidetaka
–arXiv.org Artificial Intelligence
Natural language processing (NLP) has grown significantly since the advent of the Transformer architecture. Transformers have given birth to pre-trained large language models (PLMs). There has been tremendous improvement in the performance of NLP systems across several tasks. NLP systems are on par or, in some cases, better than humans at accomplishing specific tasks. However, it remains the norm that \emph{better quality datasets at the time of pretraining enable PLMs to achieve better performance, regardless of the task.} The need to have quality datasets has prompted NLP researchers to continue creating new datasets to satisfy particular needs. For example, the two top NLP conferences, ACL and EMNLP, accepted ninety-two papers in 2022, introducing new datasets. This work aims to uncover the trends and insights mined within these datasets. Moreover, we provide valuable suggestions to researchers interested in curating datasets in the future.
arXiv.org Artificial Intelligence
Jul-15-2024
- Country:
- Africa > Niger (0.04)
- Asia
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- India > West Bengal
- Kharagpur (0.04)
- Indonesia > Sulawesi
- North Sulawesi > Manado (0.04)
- Middle East
- Israel (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.18)
- Russia (0.14)
- China
- Beijing > Beijing (0.04)
- Heilongjiang Province > Harbin (0.04)
- Hong Kong (0.05)
- Jiangsu Province > Nanjing (0.04)
- Shaanxi Province > Xi'an (0.04)
- Shanghai > Shanghai (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Vietnam > Hanoi
- Hanoi (0.04)
- Singapore (0.04)
- Philippines > Luzon
- Ilocos Region > Province of Pangasinan (0.04)
- Japan > Honshū
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.06)
- Ukraine (0.14)
- Sweden > Västerbotten County
- Umeå (0.04)
- Russia (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.14)
- Netherlands > Limburg
- Maastricht (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Germany
- Bavaria > Upper Bavaria
- Munich (0.04)
- Hesse > Darmstadt Region
- Darmstadt (0.04)
- Bavaria > Upper Bavaria
- Switzerland > Zürich
- Zürich (0.04)
- Ireland > Leinster
- North America
- Mexico > Oaxaca (0.04)
- United States
- California
- Los Angeles County > Los Angeles (0.14)
- Orange County > Irvine (0.04)
- San Diego County > San Diego (0.04)
- Virginia (0.04)
- Michigan (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Oregon (0.04)
- Arizona (0.04)
- Utah (0.04)
- New York > Suffolk County
- Stony Brook (0.04)
- Texas > Travis County
- Austin (0.04)
- California
- Oceania > Australia
- New South Wales (0.04)
- South America > Peru
- Cusco Department > Cusco Province
- Cusco (0.04)
- Huánuco Department > Huánuco Province
- Huánuco (0.04)
- Cusco Department > Cusco Province
- Genre:
- Research Report (1.00)
- Industry:
- Education (0.94)
- Health & Medicine (1.00)
- Information Technology > Services (0.46)
- Technology: