name
BRIDGE: Building Representations In Domain Guided Program Verification
George, Robert Joseph, Eisenach, Carson, Ghai, Udaya, Perrault-Joncas, Dominique, Anandkumar, Anima, Foster, Dean
Large language models (LLMs) have achieved impressive results in code generation, yet struggle with program verification, especially in interactive proof frameworks such as Lean4. A central challenge is scalability: verified synthesis requires not just code, but also precise specifications and correctness proofs, and existing approaches rarely span all three domains. We present BRIDGE, the first systematic study of structured prompting for scalable verified program generation. BRIDGE decomposes verification into three interconnected domains: Code (executable implementations), Specifications (formal intent statements), and Proofs (constructive correctness arguments). Our key idea is to elicit distinct reasoning behaviors functional, specification-driven, and proof-oriented as intermediate representations that preserve semantic structure and connect these domains. Through systematic ablations, we show that this approach substantially improves both accuracy and efficiency beyond standard error feedback methods. For example, functional reasoning improves correctness of code in formal languages (Lean4) by nearly 1.5x (pass@5) over direct baselines. In inference-time compute, functional reasoning is also 2x more efficient, achieving higher pass rates with fewer generations and lower total sampling budgets. Similarly, we find that specification-driven prompting boosts Python coding pass rates by up to 17.5%. These findings suggest that structured domain alignment is a promising direction for advancing verified synthesis. BRIDGE establishes a foundation for training via expert iteration or RLVR, enabling models to internalize these reasoning strategies across code, specifications, and proofs.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > California (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- Information Technology (0.92)
- Leisure & Entertainment > Games (0.67)
Supplementary Materials for MLP-Mixer: An all-MLP Architecture for Vision
We did not observe any noticeable improvements. In other words, token-mixing MLPs operate by looking at only one channel at once. All layers in Mixer retain the same, isotropic design. Table 1: Hyperparameter settings used for pre-training Mixer models. However, these did not lead to consistent improvements, so we dropped them.
Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark
Shen, Xinjie, Li, Mufei, Li, Pan
The deployment of Large Language Models (LLMs) in embodied agents creates an urgent need to measure their privacy awareness in the physical world. Existing evaluation methods, however, are confined to natural language based scenarios. To bridge this gap, we introduce EAPrivacy, a comprehensive evaluation benchmark designed to quantify the physical-world privacy awareness of LLM-powered agents. EAPrivacy utilizes procedurally generated scenarios across four tiers to test an agent's ability to handle sensitive objects, adapt to changing environments, balance task execution with privacy constraints, and resolve conflicts with social norms. Our measurements reveal a critical deficit in current models. The top-performing model, Gemini 2.5 Pro, achieved only 59\% accuracy in scenarios involving changing physical environments. Furthermore, when a task was accompanied by a privacy request, models prioritized completion over the constraint in up to 86\% of cases. In high-stakes situations pitting privacy against critical social norms, leading models like GPT-4o and Claude-3.5-haiku disregarded the social norm over 15\% of the time. These findings, demonstrated by our benchmark, underscore a fundamental misalignment in LLMs regarding physically grounded privacy and establish the need for more robust, physically-aware alignment. Codes and datasets will be available at https://github.com/Graph-COM/EAPrivacy.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.68)
- North America > United States (0.04)
- North America > Canada (0.04)
- Consumer Products & Services > Travel (0.94)
- Transportation > Passenger (0.68)
- North America > United States > Georgia > Fulton County > Atlanta (0.40)
- North America > Canada (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (4 more...)
- Consumer Products & Services > Travel (0.93)
- Information Technology > Security & Privacy (0.93)
- Transportation > Passenger (0.67)
- Transportation > Air (0.67)
XML Prompting as Grammar-Constrained Interaction: Fixed-Point Semantics, Convergence Guarantees, and Human-AI Protocols
Structured prompting with XML tags has emerged as an effective way to steer large language models (LLMs) toward parseable, schema - adherent outputs in real - world systems. We develop a logic - first treatment of XML prompting that unifies (i) grammar - constrained decoding, (ii) fixed - point semantics over lattices of hierarchical prompts, and (iii) convergent human - AI interaction loops. We formalize a complete lattice of XML trees under a refinement order and prove that monotone prompt - to - prompt operators admit least fixed points (Knaster - Tarski) that characterize steady - state protocols; under a task - aware contraction metric on trees, we further prove Banach - style convergence of iterative guidance. We instantiate these results with context - free grammars (CFGs) for XML schemas and show how constrained decoding guarantees well - formedness while preserving task performance. A set of multi - layer human - AI interaction recipes demonstrates practical deployment patterns, including multi - pass "plan verify revise" routines and agentic tool use. We provide mathematically complete proofs and tie our framework to recent advances in grammar - aligned decoding, chain - of - verification, and programmatic prompting. Keywords: XML prompting; grammar - constrained decoding; fixed - point theorems; Banach contraction; Knaster - Tarski; modal µ - calculus; structured outputs; human - AI interaction; arXiv cs.AI; arXiv cs.CL
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)
Toward Reproducible Cross-Backend Compatibility for Deep Learning: A Configuration-First Framework with Three-Tier Verification
This paper presents a configuration-first framework for evaluating cross-backend compatibility in deep learning systems deployed on CPU, GPU, and compiled runtimes. The framework decouples experiments from code using YAML, supports both library and repository models, and employs a three-tier verification protocol covering tensor-level closeness, activation alignment, and task-level metrics. Through 672 checks across multiple models and tolerance settings, we observe that 72.0% of runs pass, with most discrepancies occurring under stricter thresholds. Our results show that detection models and compiled backends are particularly prone to drift, often due to nondeterministic post-processing. We further demonstrate that deterministic adapters and selective fallbacks can substantially improve agreement without significant performance loss. To our knowledge, this is the first unified framework that systematically quantifies and mitigates cross-backend drift in deep learning, providing a reproducible methodology for dependable deployment across heterogeneous runtimes.
How and why parents and teachers are introducing young children to AI
Since the release of ChatGPT in late 2022, generative artificial intelligence has trickled down from adults in their offices to university students in campus libraries to teenagers in high school hallways. Now it's reaching the youngest among us, and parents and teachers are grappling with the most responsible way to introduce their under-13s to a new technology that may fundamentally reshape the future. Though the terms of service for ChatGPT, Google's Gemini and other AI models specify that the tools are only meant for those over 13, parents and teachers are taking the matter of AI education into their own hands. Inspired by a story we published on parents who are teaching their children to use AI to set them up for success in school and at work, we asked Guardian readers how and why – or why not – others are doing the same. Though our original story only concerned parents, we have also included teachers in the responses published below, as preparing children for future studies and jobs is one of educators' responsibilities as well.
- Oceania > New Zealand > South Island > Canterbury Region > Christchurch (0.04)
- North America > United States > Florida > Palm Beach County > Palm Beach Gardens (0.04)
- North America > United States > California (0.04)
- (9 more...)
- Education > Educational Setting > Higher Education (0.49)
- Education > Educational Setting > K-12 Education > Secondary School (0.35)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)