Goto

Collaborating Authors

 soundness


A Categorical Analysis of Large Language Models and Why LLMs Circumvent the Symbol Grounding Problem

Floridi, Luciano, Jia, Yiyang, Tohmé, Fernando

arXiv.org Artificial Intelligence

This paper presents a formal, categorical framework for analysing how humans and large language models (LLMs) transform content into truth-evaluated propositions about a state space of possible worlds W , in order to argue that LLMs do not solve but circumvent the symbol grounding problem.


Ideal Attribution and Faithful Watermarks for Language Models

Song, Min Jae, Shahabi, Kameron

arXiv.org Machine Learning

We introduce ideal attribution mechanisms, a formal abstraction for reasoning about attribution decisions over strings. At the core of this abstraction lies the ledger, an append-only log of the prompt-response interaction history between a model and its user. Each mechanism produces deterministic decisions based on the ledger and an explicit selection criterion, making it well-suited to serve as a ground truth for attribution. We frame the design goal of watermarking schemes as faithful representation of ideal attribution mechanisms. This novel perspective brings conceptual clarity, replacing piecemeal probabilistic statements with a unified language for stating the guarantees of each scheme. It also enables precise reasoning about desiderata for future watermarking schemes, even when no current construction achieves them, since the ideal functionalities are specified first. In this way, the framework provides a roadmap that clarifies which guarantees are attainable in an idealized setting and worth pursuing in practice.


The Coding Limits of Robust Watermarking for Generative Models

Francati, Danilo, Goonatilake, Yevin Nikhel, Pawar, Shubham, Venturi, Daniele, Ateniese, Giuseppe

arXiv.org Artificial Intelligence

We ask a basic question about cryptographic watermarking for generative models: to what extent can a watermark remain reliable when an adversary is allowed to corrupt the encoded signal? To study this question, we introduce a minimal coding abstraction that we call a zero-bit tamper-detection code. This is a secret-key procedure that samples a pseudorandom codeword and, given a candidate word, decides whether it should be treated as unmarked content or as the result of tampering with a valid codeword. It captures the two core requirements of robust watermarking: soundness and tamper detection. Within this abstraction we prove a sharp unconditional limit on robustness to independent symbol corruption. For an alphabet of size $q$, there is a critical corruption rate of $1 - 1/q$ such that no scheme with soundness, even relaxed to allow a fixed constant false positive probability on random content, can reliably detect tampering once an adversary can change more than this fraction of symbols. In particular, in the binary case no cryptographic watermark can remain robust if more than half of the encoded bits are modified. We also show that this threshold is tight by giving simple information-theoretic constructions that achieve soundness and tamper detection for all strictly smaller corruption rates. We then test experimentally whether this limit appears in practice by looking at the recent watermarking for images of Gunn, Zhao, and Song (ICLR 2025). We show that a simple crop and resize operation reliably flipped about half of the latent signs and consistently prevented belief-propagation decoding from recovering the codeword, erasing the watermark while leaving the image visually intact.


Formal Models and Convergence Analysis for Context-Aware Security Verification

Chaudhary, Ayush

arXiv.org Artificial Intelligence

Traditional security scanners fail when facing new attack patterns they haven't seen before. They rely on fixed rules and predetermined signatures, making them blind to novel threats. We present a fundamentally different approach: instead of memorizing specific attack patterns, we learn what makes systems genuinely secure. Our key insight is simple yet powerful: context determines vulnerability. A SQL query that's safe in one environment becomes dangerous in another. By modeling this context-vulnerability relationship, we achieve something remarkable: our system detects attacks it has never seen before. We introduce context-aware verification that learns from genuine system behavior. Through reconstruction learning on secure systems, we capture their essential characteristics. When an unknown attack deviates from these patterns, our system recognizes it, even without prior knowledge of that specific attack type. We prove this capability theoretically, showing detection rates improve exponentially with context information I(W;C). Our framework combines three components: (1) reconstruction learning that models secure behavior, (2) multi-scale graph reasoning that aggregates contextual clues, and (3) attention mechanisms guided by reconstruction differences. Extensive experiments validate our approach: detection accuracy jumps from 58 percent to 82 percent with full context, unknown attack detection improves by 31 percent, and our system maintains above 90 percent accuracy even against completely novel attack vectors.


Cost-Driven Synthesis of Sound Abstract Interpreters

Gu, Qiuhan, Singh, Avaljot, Singh, Gagandeep

arXiv.org Artificial Intelligence

Constructing abstract interpreters that provide global soundness guarantees remains a major obstacle in abstract interpretation. We investigate whether modern LLMs can reduce this burden by leveraging them to synthesize sound, non-trivial abstract interpreters across multiple abstract domains in the setting of neural network verification. We formulate synthesis as a constrained optimization problem and introduce a novel mathematically grounded cost function for measuring unsoundness under strict syntactic and semantic constraints. Based on this formulation, we develop a unified framework that unifies LLM-based generation with syntactic and semantic validation and a quantitative cost-guided feedback mechanism. Empirical results demonstrate that our framework not only matches the quality of handcrafted transformers, but more importantly, discovers sound, high-precision transformers for complex nonlinear operators that are absent from existing literature.





We are glad that all reviewers appreciated the soundness of our work, the importance of the hidden stratification (HS)

Neural Information Processing Systems

ERM model to obtain a feature representation and then trains a second, robust model. With tuning of learning rate schedules and other hyperparameters (HPs), GEORGE's cost could be further reduced. D.4, we define "inherent hardness" as the minimum possible worst-case subclass We hope that building on this method may also be of independent interest. Our results are fairly insensitive (no significant performance drop) to reasonable variation in these HPs. Additional classification metrics (ISIC omitted for space).


A Neurosymbolic Approach to Natural Language Formalization and Verification

Bayless, Sam, Buliani, Stefano, Cassel, Darion, Cook, Byron, Clough, Duncan, Delmas, Rémi, Diallo, Nafi, Erata, Ferhat, Feng, Nick, Giannakopoulou, Dimitra, Goel, Aman, Gokhale, Aditya, Hendrix, Joe, Hudak, Marc, Jovanović, Dejan, Kent, Andrew M., Kiesl-Reiter, Benjamin, Kuna, Jeffrey J., Labai, Nadia, Lilien, Joseph, Raghunathan, Divya, Rakamarić, Zvonimir, Razavi, Niloofar, Tautschnig, Michael, Torkamani, Ali, Weir, Nathaniel, Whalen, Michael W., Yao, Jianan

arXiv.org Artificial Intelligence

Large Language Models perform well at natural language interpretation and reasoning, but their inherent stochasticity limits their adoption in regulated industries like finance and healthcare that operate under strict policies. To address this limitation, we present a two-stage neurosymbolic framework that (1) uses LLMs with optional human guidance to formalize natural language policies, allowing fine-grained control of the formalization process, and (2) uses inference-time autofor-malization to validate logical correctness of natural language statements against those policies. When correctness is paramount, we perform multiple redundant formalization steps at inference time, cross checking the formalizations for semantic equivalence. Our benchmarks demonstrate that our approach exceeds 99% soundness, indicating a near-zero false positive rate in identifying logical validity. Our approach produces auditable logical artifacts that substantiate the verification outcomes and can be used to improve the original text. The content generation and reasoning capabilities of Large Language Models (LLMs) continue to advance rapidly, demonstrating unprecedented improvements in coherence and analytical accuracy (Wei et al., 2022; Y ao et al., 2023; Lewis et al., 2021). Despite these advances, their probabilistic nature and tendency to generate plausible but incorrect information (hallucinations, cf. Xu et al. 2024b) remain barriers to widespread adoption in regulated sectors. Industries such as healthcare, financial services, and legal practices have legal and regulatory obligations for accuracy and auditability that current LLM technology has yet to meet (Haltaufderheide & Ranisch, 2024). Companies develop institutional policies to ensure compliance with applicable laws and regulations. Such policies are typically captured in natural language (NL) documents that define rules, procedures, or guidelines. A challenge thus emerges when organizations look to deploy LLMs to answer questions about such documents: can we develop guardrails to ensure that LLM outputs conform to institutional policies? Consider an airline implementing a chatbot to assist customer service representatives in navigating refund policies: if the chatbot incorrectly claims that a customer is eligible for a refund when they are not, this could lead to legal exposure and loss of customer trust. An effective guardrail would help representatives decide if they can rely on a chatbot response without spending additional human effort to verify it. The key concern would be to ensure that when the guardrail reports an answer is valid, it actually is.