elicitation
Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives
Li, Chloe, Phuong, Mary, Tan, Daniel
As AI systems become more capable of complex agentic tasks, they also become more capable of pursuing undesirable objectives and causing harm. Previous work has attempted to catch these unsafe instances by interrogating models directly about their objectives and behaviors. However, the main weakness of trusting interrogations is that models can lie. We propose self-report fine-tuning (SRFT), a simple supervised fine-tuning technique that trains models to occasionally make factual mistakes, then admit them when asked. We show that the admission of factual errors in simple question-answering settings generalizes out-of-distribution (OOD) to the admission of hidden misaligned objectives in adversarial agentic settings. We evaluate SRFT in OOD stealth tasks, where models are instructed to complete a hidden misaligned objective alongside a user-specified objective without being caught by monitoring. After SRFT, models are more likely to confess the details of their hidden objectives when interrogated, even under strong pressure not to disclose them. Interrogation on SRFT models can detect hidden objectives with near-ceiling performance (F1 score = 0.98), while the baseline model lies when interrogated under the same conditions (F1 score = 0). Interrogation on SRFT models can further elicit the content of the hidden objective, recovering 28-100% details, compared to 0% details recovered in the baseline model and by prefilled assistant turn attacks. This provides a promising technique for promoting honesty propensity and incriminating misaligned AIs.
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
How many patients could we save with LLM priors?
Arai, Shota, Selby, David, Vargo, Andrew, Vollmer, Sebastian
Imagine a world where clinical trials need far fewer patients to achieve the same statistical power, thanks to the knowledge encoded in large language models (LLMs). We present a novel framework for hierarchical Bayesian modeling of adverse events in multi-center clinical trials, leveraging LLM-informed prior distributions. Unlike data augmentation approaches that generate synthetic data points, our methodology directly obtains parametric priors from the model. Our approach systematically elicits informative priors for hyperparameters in hierarchical Bayesian models using a pre-trained LLM, enabling the incorporation of external clinical expertise directly into Bayesian safety modeling. Through comprehensive temperature sensitivity analysis and rigorous cross-validation on real-world clinical trial data, we demonstrate that LLM-derived priors consistently improve predictive performance compared to traditional meta-analytical approaches. This methodology paves the way for more efficient and expert-informed clinical trial design, enabling substantial reductions in the number of patients required to achieve robust safety assessment and with the potential to transform drug safety monitoring and regulatory decision making.
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.47)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.57)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Chūbu > Shizuoka Prefecture > Shizuoka (0.04)
- Asia > Japan > Honshū > Tōhoku (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (4 more...)
- Leisure & Entertainment (0.93)
- Information Technology (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Communications > Social Media (0.68)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- North America > United States > Illinois (0.05)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
AI for Requirements Engineering: Industry adoption and Practitioner perspectives
Rani, Lekshmi Murali, Svensson, Richard Berntsson, Feldt, Robert
The integration of AI for Requirements Engineering (RE) presents significant benefits but also poses real challenges. Although RE is fundamental to software engineering, limited research has examined AI adoption in RE. We surveyed 55 software practitioners to map AI usage across four RE phases: Elicitation, Analysis, Specification, and Validation, and four approaches for decision making: human-only decisions, AI validation, Human AI Collaboration (HAIC), and full AI automation. Participants also shared their perceptions, challenges, and opportunities when applying AI for RE tasks. Our data show that 58.2% of respondents already use AI in RE, and 69.1% view its impact as positive or very positive. HAIC dominates practice, accounting for 54.4% of all RE techniques, while full AI automation remains minimal at 5.4%. Passive AI validation (4.4 to 6.2%) lags even further behind, indicating that practitioners value AI's active support over passive oversight. These findings suggest that AI is most effective when positioned as a collaborative partner rather than a replacement for human expertise. It also highlights the need for RE-specific HAIC frameworks along with robust and responsible AI governance as AI adoption in RE grows.
- Europe > Switzerland (0.05)
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
- Europe > Germany (0.04)
- Questionnaire & Opinion Survey (1.00)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.88)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (0.68)