AITopics

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States (0.04)

Genre: Research Report (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Neural Information Processing SystemsDec-24-2025, 04:22:09 GMT

Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications

Most real world applications require dealing with stochasticity like sensor noise or predictive uncertainty, where formal specifications of desired behavior are inherently probabilistic. Despite the promise of formal verification in ensuring the reliability of neural networks, progress in the direction of probabilistic specifications has been limited. In this direction, we first introduce a general formulation of probabilistic specifications for neural networks, which captures both probabilistic networks (e.g., Bayesian neural networks, MC-Dropout networks) and uncertain inputs (distributions over inputs arising from sensor noise or other perturbations). We then propose a general technique to verify such specifications by generalizing the notion of Lagrangian duality, replacing standard Lagrangian multipliers with functional multipliers that can be arbitrary functions of the activations at a given layer. We show that an optimal choice of functional multipliers leads to exact verification (i.e., sound and complete verification), and for specific forms of multipliers, we develop tractable practical verification algorithms.

name change, specification, verifying probabilistic specification, (11 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.58)

arXiv.org Artificial IntelligenceNov-6-2025

Academics and Generative AI: Empirical and Epistemic Indicators of Policy-Practice Voids

Ravenor, R. Yamamoto

As generative AI diffuses through academia, policy-practice divergence becomes consequential, creating demand for auditable indicators of alignment. This study prototypes a ten-item, indirect-elicitation instrument embedded in a structured interpretive framework to surface voids between institutional rules and practitioner AI use. The framework extracts empirical and epistemic signals from academics, yielding three filtered indicators of such voids: (1) AI-integrated assessment capacity (proxy) - within a three-signal screen (AI skill, perceived teaching benefit, detection confidence), the share who would fully allow AI in exams; (2) sector-level necessity (proxy) - among high output control users who still credit AI with high contribution, the proportion who judge AI capable of challenging established disciplines; and (3) ontological stance - among respondents who judge AI different in kind from prior tools, report practice change, and pass a metacognition gate, the split between material and immaterial views as an ontological map aligning procurement claims with evidence classes.

artificial intelligence, machine learning, natural language, (19 more...)

2511.02875

Country:

Asia > Japan > Honshū > Kantō (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > Experimental Study (0.69)

Industry: Education > Educational Setting > Higher Education (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.76)

Neural Information Processing SystemsAug-17-2025, 19:13:41 GMT

d9de6a144a3cc26cb4b3c47b206a121a-Supplemental.pdf

artificial intelligence, gcw, machine learning, (15 more...)

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

arXiv.org Artificial IntelligenceJun-5-2025

High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning

Franzmeyer, Tim, Sravankumar, Archie, Liu, Lijuan, Mao, Yuning, Hou, Rui, Wang, Sinong, Foerster, Jakob N., Zettlemoyer, Luke, Khabsa, Madian

Large Language Models (LLMs) currently respond to every prompt. However, they can produce incorrect answers when they lack knowledge or capability -- a problem known as hallucination. We instead propose post-training an LLM to generate content only when confident in its correctness and to otherwise (partially) abstain. Specifically, our method, HALT, produces capability-aligned post-training data that encodes what the model can and cannot reliably generate. We generate this data by splitting responses of the pretrained LLM into factual fragments (atomic statements or reasoning steps), and use ground truth information to identify incorrect fragments. We achieve capability-aligned finetuning responses by either removing incorrect fragments or replacing them with "Unsure from Here" -- according to a tunable threshold that allows practitioners to trade off response completeness and mean correctness of the response's fragments. We finetune four open-source models for biography writing, mathematics, coding, and medicine with HALT for three different trade-off thresholds. HALT effectively trades off response completeness for correctness, increasing the mean correctness of response fragments by 15% on average, while resulting in a 4% improvement in the F1 score (mean of completeness and correctness of the response) compared to the relevant baselines. By tuning HALT for highest correctness, we train a single reliable Llama3-70B model with correctness increased from 51% to 87% across all four domains while maintaining 53% of the response completeness achieved with standard finetuning.

artificial intelligence, large language model, natural language, (17 more...)

2506.04051

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Neural Information Processing SystemsMay-26-2025, 20:27:14 GMT

Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications

Most real world applications require dealing with stochasticity like sensor noise or predictive uncertainty, where formal specifications of desired behavior are inherently probabilistic. Despite the promise of formal verification in ensuring the reliability of neural networks, progress in the direction of probabilistic specifications has been limited. In this direction, we first introduce a general formulation of probabilistic specifications for neural networks, which captures both probabilistic networks (e.g., Bayesian neural networks, MC-Dropout networks) and uncertain inputs (distributions over inputs arising from sensor noise or other perturbations). We then propose a general technique to verify such specifications by generalizing the notion of Lagrangian duality, replacing standard Lagrangian multipliers with "functional multipliers" that can be arbitrary functions of the activations at a given layer. We show that an optimal choice of functional multipliers leads to exact verification (i.e., sound and complete verification), and for specific forms of multipliers, we develop tractable practical verification algorithms.

artificial intelligence, machine learning, specification, (10 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.76)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)

Ross, Hayley, Davidson, Kathryn, Kim, Najoung

Is artificial intelligence still intelligence? LLMs generalize to novel adjective-noun pairs, but don't mimic the full human distribution

arXiv.org Artificial IntelligenceOct-22-2024

Inferences from adjective-noun combinations like "Is artificial intelligence still intelligence?" provide a good test bed for LLMs' understanding of meaning and compositional generalization capability, since there are many combinations which are novel to both humans and LLMs but nevertheless elicit convergent human judgments. We study a range of LLMs and find that the largest models we tested are able to draw human-like inferences when the inference is determined by context and can generalize to unseen adjective-noun combinations. We also propose three methods to evaluate LLMs on these inferences out of context, where there is a distribution of human-like answers rather than a single correct answer. We find that LLMs show a human-like distribution on at most 75\% of our dataset, which is promising but still leaves room for improvement.

bigram, large language model, machine learning, (21 more...)

2410.17482

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Jordan (0.04)
North America > Dominican Republic (0.04)
(12 more...)

Genre: Research Report (0.51)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Education (0.93)
Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

Neural Information Processing SystemsOct-10-2024, 15:00:31 GMT

Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications

Most real world applications require dealing with stochasticity like sensor noise or predictive uncertainty, where formal specifications of desired behavior are inherently probabilistic. Despite the promise of formal verification in ensuring the reliability of neural networks, progress in the direction of probabilistic specifications has been limited. In this direction, we first introduce a general formulation of probabilistic specifications for neural networks, which captures both probabilistic networks (e.g., Bayesian neural networks, MC-Dropout networks) and uncertain inputs (distributions over inputs arising from sensor noise or other perturbations). We then propose a general technique to verify such specifications by generalizing the notion of Lagrangian duality, replacing standard Lagrangian multipliers with "functional multipliers" that can be arbitrary functions of the activations at a given layer. We show that an optimal choice of functional multipliers leads to exact verification (i.e., sound and complete verification), and for specific forms of multipliers, we develop tractable practical verification algorithms.

neural network, specification, verifying probabilistic specification, (8 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.76)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)

Tachella, Julián, Davies, Mike, Jacques, Laurent

UNSURE: Unknown Noise level Stein's Unbiased Risk Estimator

arXiv.org Machine LearningSep-3-2024

Recently, many self-supervised learning methods for image reconstruction have been proposed that can learn from noisy data alone, bypassing the need for ground-truth references. Most existing methods cluster around two classes: i) Noise2Self and similar cross-validation methods that require very mild knowledge about the noise distribution, and ii) Stein's Unbiased Risk Estimator (SURE) and similar approaches that assume full knowledge of the distribution. The first class of methods is often suboptimal compared to supervised learning, and the second class is often impractical, as the noise level is generally unknown in real-world applications. In this paper, we provide a theoretical framework that characterizes this expressivity-robustness trade-off and propose a new approach based on SURE, but unlike the standard SURE, does not require knowledge about the noise level. Throughout a series of experiments, we show that the proposed estimator outperforms other existing self-supervised methods on various imaging inverse problems.

unbiased risk estimator, unknown noise level stein, unsure

arXiv.org Machine Learning

2409.01985

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.73)

T, Balamurali B, Chen, Jer-Ming

Performance Assessment of ChatGPT vs Bard in Detecting Alzheimer's Dementia

arXiv.org Artificial IntelligenceJan-30-2024

Large language models (LLMs) find increasing applications in many fields. Here, three LLM chatbots (ChatGPT-3.5, ChatGPT-4 and Bard) are assessed - in their current form, as publicly available - for their ability to recognize Alzheimer's Dementia (AD) and Cognitively Normal (CN) individuals using textual input derived from spontaneous speech recordings. Zero-shot learning approach is used at two levels of independent queries, with the second query (chain-of-thought prompting) eliciting more detailed than the first. Each LLM chatbot's performance is evaluated on the prediction generated in terms of accuracy, sensitivity, specificity, precision and F1 score. LLM chatbots generated three-class outcome ("AD", "CN", or "Unsure"). When positively identifying AD, Bard produced highest true-positives (89% recall) and highest F1 score (71%), but tended to misidentify CN as AD, with high confidence (low "Unsure" rates); for positively identifying CN, GPT-4 resulted in the highest true-negatives at 56% and highest F1 score (62%), adopting a diplomatic stance (moderate "Unsure" rates). Overall, three LLM chatbots identify AD vs CN surpassing chance-levels but do not currently satisfy clinical application.

cn subject, llm chatbot, unsure, (14 more...)

2402.01751

Country: Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Dementia (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)