AITopics | underspecification

Collaborating Authors

underspecification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

33201f38001dd381aba2c462051449ba-Paper-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 00:51:18 GMT

equivalent model, explanation, shapley value, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.29)
Asia > Singapore (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Banking & Finance (0.46)
Information Technology > Security & Privacy (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Accounting for Underspecification in Statistical Claims of Model Superiority

Sanchez, Thomas, Gordaliza, Pedro M., Cuadra, Meritxell Bach

arXiv.org Artificial IntelligenceNov-5-2025

Machine learning methods are increasingly applied in medical imaging, yet many reported improvements lack statistical robustness: recent works have highlighted that small but significant performance gains are highly likely to be false positives. However, these analyses do not take \emph{underspecification} into account -- the fact that models achieving similar validation scores may behave differently on unseen data due to random initialization or training dynamics. Here, we extend a recent statistical framework modeling false outperformance claims to include underspecification as an additional variance component. Our simulations demonstrate that even modest seed variability ($\sim1\%$) substantially increases the evidence required to support superiority claims. Our findings underscore the need for explicit modeling of training variance when validating medical imaging systems.

artificial intelligence, machine learning, underspecification, (17 more...)

arXiv.org Artificial Intelligence

2511.02453

Country: Europe > Switzerland (0.15)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

What Prompts Don't Say: Understanding and Managing Underspecification in LLM Prompts

Yang, Chenyang, Shi, Yike, Ma, Qianou, Liu, Michael Xieyang, Kästner, Christian, Wu, Tongshuang

arXiv.org Artificial IntelligenceOct-8-2025

Prompt underspecification is a common challenge when interacting with LLMs. In this paper, we present an in-depth analysis of this problem, showing that while LLMs can often infer unspecified requirements by default (41.1%), such behavior is fragile: Under-specified prompts are 2x as likely to regress across model or prompt changes, sometimes with accuracy drops exceeding 20%. This instability makes it difficult to reliably build LLM applications. Moreover, simply specifying all requirements does not consistently help, as models have limited instruction-following ability and requirements can conflict. Standard prompt optimizers likewise provide little benefit. To address these issues, we propose requirements-aware prompt optimization mechanisms that improve performance by 4.8% on average over baselines. We further advocate for a systematic process of proactive requirements discovery, evaluation, and monitoring to better manage prompt underspecification in practice.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.1336

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Aligning Prediction Models with Clinical Experiential Learning: A Prostate Cancer Case Study

Vallon, Jacqueline J., Overman, William, Xu, Wanqiao, Panjwani, Neil, Ling, Xi, Vij, Sushmita, Bagshaw, Hilary P., Leppert, John T., Shah, Sumit, Sonn, Geoffrey, Srinivas, Sandy, Pollom, Erqi, Buyyounouski, Mark K., Bayati, Mohsen

arXiv.org Artificial IntelligenceSep-5-2025

Over the past decade, the use of machine learning (ML) models in healthcare applications has rapidly increased. Despite high performance, modern ML models do not always capture patterns the end user requires. For example, a model may predict a non-monotonically decreasing relationship between cancer stage and survival, keeping all other features fixed. In this paper, we present a reproducible framework for investigating this misalignment between model behavior and clinical experiential learning, focusing on the effects of underspecification of modern ML pipelines. In a prostate cancer outcome prediction case study, we first identify and address these inconsistencies by incorporating clinical knowledge, collected by a survey, via constraints into the ML model, and subsequently analyze the impact on model performance and behavior across degrees of underspecification. The approach shows that aligning the ML model with clinical experiential learning is possible without compromising performance. Motivated by recent literature in generative AI, we further examine the feasibility of a feedback-driven alignment approach in non-generative AI clinical risk prediction models through a randomized experiment with clinicians. Our findings illustrate that, by eliciting clinicians' model preferences using our proposed methodology, the larger the difference in how the constrained and unconstrained models make predictions for a patient, the more apparent the difference is in clinical interpretation.

artificial intelligence, clinician, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.04053

Country: North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Prostate Cancer (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

7a50d83a1e70e9d96c3357438aed7a44-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 08:15:50 GMT

interaction, probability, sequence, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Costa Rica > Heredia Province > Heredia (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Spain (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Add feedback

33201f38001dd381aba2c462051449ba-Paper-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 04:53:01 GMT

equivalent model, explanation, shapley value, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.29)
Asia > Singapore (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Banking & Finance (0.46)
Information Technology > Security & Privacy (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Small Edits, Big Consequences: Telling Good from Bad Robustness in Large Language Models

Ismailov, Altynbek, Asanova, Salia

arXiv.org Artificial IntelligenceJul-23-2025

Large language models (LLMs) now write code in settings where misreading a single word can break safety or cost money, yet we still expect them to overlook stray typos. To probe where useful robustness ends and harmful insensitivity begins, we compile 50 LeetCode problems and craft three minimal prompt perturbations that should vary in importance: (i) progressive underspecification deleting 10 % of words per step; (ii) lexical flip swapping a pivotal quantifier ("max" to "min"); and (iii) jargon inflation replacing a common noun with an obscure technical synonym. Six frontier models, including three "reasoning-tuned" versions, solve each mutated prompt, and their Python outputs are checked against the original test suites to reveal whether they reused the baseline solution or adapted. Among 11 853 generations we observe a sharp double asymmetry. Models remain correct in 85 % of cases even after 90 % of the prompt is missing, showing over-robustness to underspecification, yet only 54 % react to a single quantifier flip that reverses the task, with reasoning-tuned variants even less sensitive than their bases. Jargon edits lie in between, passing through 56 %. Current LLMs thus blur the line between harmless noise and meaning - changing edits, often treating both as ignorable. Masking salient anchors such as function names can force re - evaluation. We advocate evaluation and training protocols that reward differential sensitivity: stay steady under benign noise but adapt - or refuse - when semantics truly change.

exam question, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2507.15868

Genre: Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Principles of semantic and functional efficiency in grammatical patterning

Cheng, Emily, Franzon, Francesca

arXiv.org Artificial IntelligenceOct-21-2024

Grammatical features such as number and gender serve two central functions in human languages. While they encode salient semantic attributes like numerosity and animacy, they also offload sentence processing cost by predictably linking words together via grammatical agreement. Grammars exhibit consistent organizational patterns across diverse languages, invariably rooted in a semantic foundation, a widely confirmed but still theoretically unexplained phenomenon. To explain the basis of universal grammatical patterns, we unify two fundamental properties of grammar, semantic encoding and agreement-based predictability, into a single information-theoretic objective under cognitive constraints. Our analyses reveal that grammatical organization provably inherits from perceptual attributes, but that grammars empirically prioritize functional goals, promoting efficient language processing over semantic encoding.

artificial intelligence, grammatical value, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.15865

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Do Pre-Trained Language Models Detect and Understand Semantic Underspecification? Ask the DUST!

Wildenburg, Frank, Hanna, Michael, Pezzelle, Sandro

arXiv.org Artificial IntelligenceJun-12-2024

In everyday language use, speakers frequently utter and interpret sentences that are semantically underspecified, namely, whose content is insufficient to fully convey their message or interpret them univocally. For example, to interpret the underspecified sentence "Don't spend too much", which leaves implicit what (not) to spend, additional linguistic context or outside knowledge is needed. In this work, we propose a novel Dataset of semantically Underspecified Sentences grouped by Type (DUST) and use it to study whether pre-trained language models (LMs) correctly identify and interpret underspecified sentences. We find that newer LMs are reasonably able to identify underspecified sentences when explicitly prompted. However, interpreting them correctly is much harder for any LMs. Our experiments show that when interpreting underspecified sentences, LMs exhibit little uncertainty, contrary to what theoretical accounts of underspecification would predict. Overall, our study reveals limitations in current models' processing of sentence semantics and highlights the importance of using naturalistic data and communicative scenarios when evaluating LMs' language capabilities.

andrei, underspecification, underspecified sentence, (15 more...)

arXiv.org Artificial Intelligence

2402.12486

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Singapore (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Beyond development: Challenges in deploying machine learning models for structural engineering applications

Esteghamati, Mohsen Zaker, Bean, Brennan, Burton, Henry V., Naser, M. Z.

arXiv.org Machine LearningApr-18-2024

Machine learning (ML)-based solutions are rapidly changing the landscape of many fields, including structural engineering. Despite their promising performance, these approaches are usually only demonstrated as proof-of-concept in structural engineering, and are rarely deployed for real-world applications. This paper aims to illustrate the challenges of developing ML models suitable for deployment through two illustrative examples. Among various pitfalls, the presented discussion focuses on model overfitting and underspecification, training data representativeness, variable omission bias, and cross-validation.

deployment, ml model, zaker esteghamati, (15 more...)

arXiv.org Machine Learning

2404.12544

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Utah > Cache County > Logan (0.04)

Genre: Research Report (1.00)

Industry: Materials > Construction Materials (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback