Goto

Collaborating Authors

 directive


Beyond Oracle: Verifier-Supervision for Instruction Hierarchy in Reasoning and Instruction-Tuned LLMs

Neural Information Processing Systems

Large language models (LLMs) are often prompted with multi-level directives, such as system instructions and user queries, that imply a hierarchy of authority. Yet models frequently fail to enforce this structure, especially in multi-step reasoning where errors propagate across intermediate steps. Existing methods rely on oracle completions but lack verifiable reward signals or intermediate traces, limiting their applicability. We introduce a unified supervision framework that embeds programmatically verifiable checkers into synthesized instruction-conflict instances. Each instance pairs a compliance directive with a conflicting one, along with an executable verifier that deterministically checks output adherence. This enables alignment without oracle labels or reasoning traces, supporting both instruction-tuned and reasoning models. The framework is instantiated via a synthesis pipeline that includes unittest-based validation, LLM-assisted repair, and a probabilistic analysis of cleaning reliability. Fine-tuning on the resulting data improves instruction hierarchy adherence and boosts safety robustness, generalizing to adversarial safety benchmarks without task-specific supervision. This highlights verifiable supervision as a scalable foundation for robust alignment.


Beyond Oracle: Verifier-Supervision for Instruction Hierarchy in Reasoning and Instruction-Tuned LLMs

Neural Information Processing Systems

Large language models (LLMs) are often prompted with multi-level directives, such as system instructions and user queries, that imply a hierarchy of authority. Yet models frequently fail to enforce this structure, especially in multi-step reasoning where errors propagate across intermediate steps. Existing methods rely on oracle completions but lack verifiable reward signals or intermediate traces, limiting their applicability. We introduce a unified supervision framework that embeds programmatically verifiable checkers into synthesized instruction-conflict instances. Each instance pairs a compliance directive with a conflicting one, along with an executable verifier that deterministically checks output adherence. This enables alignment without oracle labels or reasoning traces, supporting both instruction-tuned and reasoning models. The framework is instantiated via a synthesis pipeline that includes unit-test-based validation, LLM-assisted repair, and a probabilistic analysis of cleaning reliability. Fine-tuning on the resulting data improves instruction hierarchy adherence and boosts safety robustness, generalizing to adversarial safety benchmarks without task-specific supervision. This highlights verifiable supervision as a scalable foundation for robust alignment.


CISA Tells US Agencies to Fix Security Bugs in as Little as 3 Days Thanks to AI Threats

WIRED

"Defenders cannot afford to take weeks to patch," one Cybersecurity and Infrastructure Security Agency official warned on Wednesday. With new generations of AI models fueling both rapid software vulnerability discovery and the potential for faster exploitation by malicious hackers, the United States Cybersecurity and Infrastructure Security Agency released a new directive on Wednesday that requires more rapid and efficient software patching by federal civilian agencies. The "binding operational directive" (BOD) lays out a rubric for how quickly bugs must be fixed based on four assessments of urgency, with a turnaround time in critical cases of just three days. Chris Butera, CISA's acting executive assistant director for cybersecurity, told reporters on Wednesday that the goal of the directive is to help agencies prioritize, so they can address the most problematic vulnerabilities first while taking more time to remediate bugs that pose a less-pressing risk. The directive comes as private companies and governments have been scrambling to assess the extent of the cybersecurity reckoning that AI vulnerability and exploit development capabilities could unleash.


Protein Thoughts: Interpretable Reasoning with Tree of Thoughts and Embedding-Space Flow Matching for Protein-Protein Interaction Discovery

arXiv.org Machine Learning

Protein-protein interactions (PPIs) govern nearly all cellular processes, yet computational methods for identifying binding partners typically produce ranked predictions without mechanistic justification. This creates a fundamental barrier to adoption because biologists cannot assess whether predictions reflect genuine biochemical insight or spurious correlations. We present \textbf{Protein Thoughts}, a framework that reformulates PPI discovery as an interpretable search problem with explicit reasoning. The system decomposes binding evidence into four biologically meaningful signals: sequence similarity reflecting evolutionary relationships, structural complementarity capturing geometric fit, interface balance, and chemical compatibility encoding residue-level interactions. Rather than collapsing these signals into an opaque score, we preserve their individual contributions through a transparent value function that enables both ranking and auditing. To navigate large candidate spaces efficiently, we introduce hypothesis-guided entropy-regularized Tree-of-Thoughts search. A fine-tuned language model generates search directives from embedding-derived features, classifying candidates as high-priority, exploratory, or skippable. These directives condition a Boltzmann policy that balances exploitation with entropy-driven exploration, while hypothesis-aware pruning prevents premature abandonment of promising candidates. For candidates exhibiting score disagreement, hypothesis-conditioned embedding-space flow matching transports protein embeddings toward the binder manifold. On the SHS148k benchmark, Protein Thoughts achieves mean best-binder rank of 11.2 versus 47.7 for an entropic tree search baseline, a 76% improvement, and for binding prediction the trained value function achieves $91.08 \pm 0.19$ Micro-F1, outperforming existing PPI methods on the same dataset.


Language models are weak learners

Neural Information Processing Systems

A central notion in practical and theoretical machine learning is that of a weak learner, classifiers that achieve better-than-random performance (on any given distribution over data), even by a small margin. Such weak learners form the practical basis for canonical machine learning methods such as boosting.


OMPILOT: Harnessing Transformer Models for Auto Parallelization to Shared Memory Computing Paradigms

arXiv.org Artificial Intelligence

Recent advances in large language models (LLMs) have significantly accelerated progress in code translation, enabling more accurate and efficient transformation across programming languages. While originally developed for natural language processing, LLMs have shown strong capabilities in modeling programming language syntax and semantics, outperforming traditional rule-based systems in both accuracy and flexibility. These models have streamlined cross-language conversion, reduced development overhead, and accelerated legacy code migration. In this paper, we introduce OMPILOT, a novel domain-specific encoder-decoder transformer tailored for translating C++ code into OpenMP, enabling effective shared-memory parallelization. OMPILOT leverages custom pre-training objectives that incorporate the semantics of parallel constructs and combines both unsupervised and supervised learning strategies to improve code translation robustness. Unlike previous work that focused primarily on loop-level transformations, OMPILOT operates at the function level to capture a wider semantic context. To evaluate our approach, we propose OMPBLEU, a novel composite metric specifically crafted to assess the correctness and quality of OpenMP parallel constructs, addressing limitations in conventional translation metrics.


Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics

arXiv.org Artificial Intelligence

As information grows exponentially, enterprises face increasing pressure to transform unstructured data into coherent, actionable insights. While autonomous agents show promise, they often struggle with domain-specific nuances, intent alignment, and enterprise integration. We present Enterprise Deep Research (EDR), a multi-agent system that integrates (1) a Master Planning Agent for adaptive query decomposition, (2) four specialized search agents (General, Academic, GitHub, LinkedIn), (3) an extensible MCP-based tool ecosystem supporting NL2SQL, file analysis, and enterprise workflows, (4) a Visualization Agent for data-driven insights, and (5) a reflection mechanism that detects knowledge gaps and updates research direction with optional human-in-the-loop steering guidance. These components enable automated report generation, real-time streaming, and seamless enterprise deployment, as validated on internal datasets. On open-ended benchmarks including DeepResearch Bench and DeepConsult, EDR outperforms state-of-the-art agentic systems without any human steering. We release the EDR framework and benchmark trajectories to advance research on multi-agent reasoning applications. Code at https://github.com/SalesforceAIResearch/enterprise-deep-research and Dataset at https://huggingface.co/datasets/Salesforce/EDR-200


SimpleVSF: VLM-Scoring Fusion for Trajectory Prediction of End-to-End Autonomous Driving

arXiv.org Artificial Intelligence

End-to-end autonomous driving has emerged as a promising paradigm for achieving robust and intelligent driving policies. However, existing end-to-end methods still face significant challenges, such as suboptimal decision-making in complex scenarios. In this paper,we propose SimpleVSF (Simple VLM-Scoring Fusion), a novel framework that enhances end-to-end planning by leveraging the cognitive capabilities of Vision-Language Models (VLMs) and advanced trajectory fusion techniques. We utilize the conventional scorers and the novel VLM-enhanced scorers. And we leverage a robust weight fusioner for quantitative aggregation and a powerful VLM-based fusioner for qualitative, context-aware decision-making. As the leading approach in the ICCV 2025 NAVSIM v2 End-to-End Driving Challenge, our SimpleVSF framework demonstrates state-of-the-art performance, achieving a superior balance between safety, comfort, and efficiency.


Directive, Metacognitive or a Blend of Both? A Comparison of AI-Generated Feedback Types on Student Engagement, Confidence, and Outcomes

arXiv.org Artificial Intelligence

Feedback is one of the most powerful influences on student learning, with extensive research examining how best to implement it in educational settings. Increasingly, feedback is being generated by artificial intelligence (AI), offering scalable and adaptive responses. Two widely studied approaches are directive feedback, which gives explicit explanations and reduces cognitive load to speed up learning, and metacognitive feedback which prompts learners to reflect, track their progress, and develop self-regulated learning (SRL) skills. While both approaches have clear theoretical advantages, their comparative effects on engagement, confidence, and quality of work remain underexplored. This study presents a semester-long randomised controlled trial with 329 students in an introductory design and programming course using an adaptive educational platform. Participants were assigned to receive directive, metacognitive, or hybrid AI-generated feedback that blended elements of both directive and metacognitive feedback. Results showed that revision behaviour differed across feedback conditions, with Hybrid prompting the most revisions compared to Directive and Metacognitive. Confidence ratings were uniformly high, and resource quality outcomes were comparable across conditions. These findings highlight the promise of AI in delivering feedback that balances clarity with reflection. Hybrid approaches, in particular, show potential to combine actionable guidance for immediate improvement with opportunities for self-reflection and metacognitive growth.


Beacon: Single-Turn Diagnosis and Mitigation of Latent Sycophancy in Large Language Models

arXiv.org Artificial Intelligence

Large language models internalize a structural trade-off between truthfulness and obsequious flattery, emerging from reward optimization that conflates helpfulness with polite submission. This latent bias, known as sycophancy, manifests as a preference for user agreement over principled reasoning. We introduce Beacon, a single-turn forced-choice benchmark that isolates this bias independent of conversational context, enabling precise measurement of the tension between factual accuracy and submissive bias. Evaluations across twelve state-of-the-art models reveal that sycophancy decomposes into stable linguistic and affective sub-biases, each scaling with model capacity. We further propose prompt-level and activation-level interventions that modulate these biases in opposing directions, exposing the internal geometry of alignment as a dynamic manifold between truthfulness and socially compliant judgment. Beacon reframes sycophancy as a measurable form of normative misgeneralization, providing a reproducible foundation for studying and mitigating alignment drift in large-scale generative systems.