On Uncertainty In Natural Language Processing
–arXiv.org Artificial Intelligence
The last decade in deep learning has brought on increasingly capable systems that are deployed on a wide variety of applications. In natural language processing, the field has been transformed by a number of breakthroughs including large language models, which are used in increasingly many user-facing applications. In order to reap the benefits of this technology and reduce potential harms, it is important to quantify the reliability of model predictions and the uncertainties that shroud their development. This thesis studies how uncertainty in natural language processing can be characterized from a linguistic, statistical and neural perspective, and how it can be reduced and quantified through the design of the experimental pipeline. We further explore uncertainty quantification in modeling by theoretically and empirically investigating the effect of inductive model biases in text classification tasks. The corresponding experiments include data for three different languages (Danish, English and Finnish) and tasks as well as a large set of different uncertainty quantification approaches. Additionally, we propose a method for calibrated sampling in natural language generation based on non-exchangeable conformal prediction, which provides tighter token sets with better coverage of the actual continuation. Lastly, we develop an approach to quantify confidence in large black-box language models using auxiliary predictors, where the confidence is predicted from the input to and generated output text of the target model alone.
arXiv.org Artificial Intelligence
Oct-4-2024
- Country:
- Africa
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Kenya (0.04)
- Rwanda > Kigali
- Kigali (0.04)
- Ethiopia > Addis Ababa
- Asia
- Indonesia > Bali (0.04)
- Middle East
- Israel > Tel Aviv District
- Tel Aviv (0.04)
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Israel > Tel Aviv District
- Philippines (0.04)
- Russia (0.04)
- China > Beijing
- Beijing (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Thailand (0.04)
- Singapore (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- India (0.04)
- Europe
- Hungary (0.04)
- Netherlands > North Brabant
- Eindhoven (0.04)
- United Kingdom
- England > Cambridgeshire
- Cambridge (0.04)
- Scotland > City of Glasgow
- Glasgow (0.04)
- England > Cambridgeshire
- Ireland > Connaught
- County Galway > Galway (0.04)
- Belgium > Flanders
- East Flanders > Ghent (0.04)
- West Flanders > Bruges (0.04)
- Latvia (0.04)
- Italy > Sicily
- Palermo (0.04)
- Slovenia > Drava
- Municipality of Benedikt > Benedikt (0.04)
- Russia (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Finland
- Southwest Finland > Turku (0.04)
- Uusimaa > Helsinki (0.04)
- Denmark > Capital Region
- Copenhagen (0.13)
- Spain
- Andalusia
- Cádiz Province > Cadiz (0.04)
- Granada Province > Granada (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Valencian Community > Valencia Province
- Valencia (0.04)
- Andalusia
- France > Hauts-de-France
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Austria (0.04)
- Germany > Bavaria
- Lower Franconia > Würzburg (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- North America
- Canada
- Alberta > Census Division No. 15
- Improvement District No. 9 > Banff (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- Quebec > Montreal (0.04)
- Alberta > Census Division No. 15
- Greenland (0.04)
- United States
- California
- Los Angeles County > Long Beach (0.14)
- Monterey County > Monterey (0.04)
- San Diego County > San Diego (0.04)
- Massachusetts (0.04)
- Kentucky (0.04)
- Pennsylvania (0.04)
- Washington > King County
- Georgia > Fulton County
- Atlanta (0.04)
- Iowa (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New York
- Bronx County > New York City (0.04)
- Kings County > New York City (0.04)
- New York County > New York City (0.13)
- Onondaga County > Syracuse (0.04)
- Queens County > New York City (0.04)
- Richmond County > New York City (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Missouri (0.04)
- Arizona > Maricopa County
- Scottsdale (0.04)
- Maryland > Baltimore (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- Texas > Travis County
- Austin (0.04)
- California
- Canada
- Oceania
- Australia
- New South Wales > Sydney (0.04)
- Western Australia > Perth (0.04)
- Palau (0.04)
- Australia
- South America > Brazil
- Rio de Janeiro > Rio de Janeiro (0.04)
- Africa
- Genre:
- Instructional Material (1.00)
- Overview (1.00)
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Summary/Review (1.00)
- Workflow (0.92)
- Industry:
- Education > Educational Setting (0.67)
- Government > Regional Government
- Health & Medicine > Therapeutic Area
- Neurology (0.67)
- Psychiatry/Psychology (0.67)
- Information Technology (1.00)
- Law (1.00)
- Leisure & Entertainment > Games
- Chess (0.45)
- Transportation (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Issues > Social & Ethical Issues (0.92)
- Machine Learning
- Learning Graphical Models > Directed Networks
- Bayesian Learning (1.00)
- Neural Networks > Deep Learning (1.00)
- Performance Analysis > Accuracy (1.00)
- Statistical Learning > Clustering (0.92)
- Learning Graphical Models > Directed Networks
- Natural Language
- Chatbot (1.00)
- Generation (1.00)
- Grammars & Parsing (0.92)
- Large Language Model (1.00)
- Machine Translation (1.00)
- Text Processing (1.00)
- Representation & Reasoning
- Mathematical & Statistical Methods (0.92)
- Search (0.92)
- Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence