Goto

Collaborating Authors

 Law


MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown promising performance on tasks that require reasoning, such as text-to-SQL, code generation, and debugging. However, regulatory frameworks with strict privacy requirements constrain their integration into sensitive systems. State-of-the-art LLMs are also proprietary, costly, and resource-intensive, making local deployment impractical. Consequently, utilizing such LLMs often requires sharing data with third-party providers, raising privacy concerns and risking noncompliance with regulations. Although fine-tuned small language models (SLMs) can outperform LLMs on certain tasks and be deployed locally to mitigate privacy concerns, they underperform on more complex tasks such as text-to-SQL translation. In this work, we introduce MaskSQL, a text-to-SQL framework that utilizes abstraction as a privacy protection mechanism to mask sensitive information in LLM prompts. Unlike redaction, which removes content entirely, or generalization, which broadens tokens, abstraction retains essential information while discarding unnecessary details, striking an effective privacy-utility balance for the text-to-SQL task. Moreover, by providing mechanisms to control the privacy-utility tradeoff, MaskSQL facilitates adoption across a broader range of use cases. Our experimental results show that MaskSQL outperforms leading SLM-based text-to-SQL models and achieves performance approaching state-of-the-art LLM-based models, while preserving privacy.


Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization

arXiv.org Artificial Intelligence

Current LLM unlearning methods face a critical security vulnerability that undermines their fundamental purpose: while they appear to successfully remove sensitive or harmful knowledge, this ``forgotten" information remains precariously recoverable through relearning attacks. We identify that the root cause is that conventional methods optimizing the forgetting loss at individual data points will drive model parameters toward sharp minima in the loss landscape. In these unstable regions, even minimal parameter perturbations can drastically alter the model's behaviors. Consequently, relearning attacks exploit this vulnerability by using just a few fine-tuning samples to navigate the steep gradients surrounding these unstable regions, thereby rapidly recovering knowledge that was supposedly erased. This exposes a critical robustness gap between apparent unlearning and actual knowledge removal. To address this issue, we propose StableUN, a bi-level feedback-guided optimization framework that explicitly seeks more stable parameter regions via neighborhood-aware optimization. It integrates forgetting feedback, which uses adversarial perturbations to probe parameter neighborhoods, with remembering feedback to preserve model utility, aligning the two objectives through gradient projection. Experiments on WMDP and MUSE benchmarks demonstrate that our method is significantly more robust against both relearning and jailbreaking attacks while maintaining competitive utility performance.


LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection

arXiv.org Artificial Intelligence

The rapid advancement of diffusion-based image generators has made it increasingly difficult to distinguish generated from real images. This erodes trust in digital media, making it critical to develop generated image detectors that remain reliable across different generators. While recent approaches leverage diffusion denoising cues, they typically rely on single-step reconstruction errors and overlook the sequential nature of the denoising process. In this work, we propose LATTE - LATent Trajectory Embedding - a novel approach that models the evolution of latent embeddings across multiple denoising steps. Instead of treating each denoising step in isolation, LATTE captures the trajectory of these representations, revealing subtle and discriminative patterns that distinguish real from generated images. Experiments on several benchmarks, such as GenImage, Chameleon, and Diffusion Forensics, show that LATTE achieves superior performance, especially in challenging cross-generator and cross-dataset scenarios, highlighting the potential of latent trajectory modeling. The code is available on the following link: https://github.com/AnaMVasilcoiu/LATTE-Diffusion-Detector.


FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation

arXiv.org Artificial Intelligence

Federated Learning (FL) enables collaborative model training across multiple clients without sharing clients' private data. However, the diverse and often conflicting biases present across clients pose significant challenges to model fairness. Current fairness-enhancing FL solutions often fall short, as they typically mitigate biases for a single, usually binary, sensitive attribute, while ignoring the heterogeneous fairness needs that exist in real-world settings. Moreover, these solutions often evaluate unfairness reduction only on the server side, hiding persistent unfairness at the individual client level. To support more robust and reproducible fairness research in FL, we introduce a comprehensive benchmarking framework for fairness-aware FL at both the global and client levels. Our contributions are three-fold: (1) We introduce \fairdataset, a library to create tabular datasets tailored to evaluating fair FL methods under heterogeneous client bias; (2) we release four bias-heterogeneous datasets and corresponding benchmarks to compare fairness mitigation methods in a controlled environment; (3) we provide ready-to-use functions for evaluating fairness outcomes for these datasets.


Frankentext: Stitching random text fragments into long-form narratives

arXiv.org Artificial Intelligence

Though stitched together from disparate parts, the creature emerges as a disturbingly articulate and clever being. ": long-form narratives constructed by LLMs Code and data will be released after the double-blind review process. However, they also identify subtle issues (e.g., abrupt That said, the LLMs' novel recombination of these writings (a feat virtually impossible for a human) Our experiments focus on 500-word generations, and we leave the exploration of longer texts to future work. A snippet refers to a paragraph. We note that the paragraphs could be sourced from anywhere (e.g., Project Gutenberg, news, etc.), but we select Books3 as our writing prompts likely benefit from modern writing. First, random paragraphs are sampled from a large corpus of human-written books. Frankentext, which is further edited into a coherent and faithful final version (see Algorithm 1). We also do not specify how many snippets should be used in the final story. Polishing the draft: The first draft may contain writing issues such as contradictions (e.g., temporally conflicting actions, points of view, or character traits), irrelevant content (e.g., unfiltered citations or filler text), and mechanical problems (e.g., grammar, phrasing, or pronoun mismatches). The MCP interface supports two operations: search, which submits a query to the index, and fetch, which retrieves the full text of a result. Gemini typically makes 15-20 calls per generation to retrieve relevant snippets (Table 10). V anilla generation contains more AI keywords (in blue) compared to Frankentexts ( I). Writing Prompt: Y ou are a baby. Y ou are trying not to let anyone know you are a freakishly genius baby. The other babies at daycare don't quite get what you're on about.Gemini 2.5 Pro - V anilla Generation Gemini 2.5 Pro - Frankentext (5K snippets) "Goo," I offered, a carefully calibrated emission of sound designed to convey contentment. Internally, I was running simulations on the struc tural in tegrity of the mobile dangling pre car i ously above my cot. The Large Female Hominid, designated'Mom,' beamed. Liam was attempting to gum Chloe's earlobe. Neither celestial gods nor the great sages know my origin. But this deception is a constant struggle. Here, we play our games, and another baby, a real space case, would fasten his bib tight because he tried to kill everybody. It didn't matter if it was He'd hit his grandmother if she had a bonnet on. That's why we called him'Killer.' He just loved to kill you. I try to organize them. "Gentlemen, I make the motion that these United But there is no response.


AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

arXiv.org Artificial Intelligence

Audio Large Language Models (ALLMs) have gained widespread adoption, yet their trustworthiness remains underexplored. Existing evaluation frameworks, designed primarily for text, fail to address unique vulnerabilities introduced by audio's acoustic properties. We identify significant trustworthiness risks in ALLMs arising from non-semantic acoustic cues, including timbre, accent, and background noise, which can manipulate model behavior. We propose AudioTrust, a comprehensive framework for systematic evaluation of ALLM trustworthiness across audio-specific risks. AudioTrust encompasses six key dimensions: fairness, hallucination, safety, privacy, robustness, and authentication. The framework implements 26 distinct sub-tasks using a curated dataset of over 4,420 audio samples from real-world scenarios, including daily conversations, emergency calls, and voice assistant interactions. We conduct comprehensive evaluations across 18 experimental configurations using human-validated automated pipelines. Our evaluation of 14 state-of-the-art open-source and closed-source ALLMs reveals significant limitations when confronted with diverse high-risk audio scenarios, providing insights for secure deployment of audio models. Code and data are available at https://github.com/JusperLee/AudioTrust.


Neglected Risks: The Disturbing Reality of Children's Images in Datasets and the Urgent Call for Accountability

arXiv.org Artificial Intelligence

Including children's images in datasets has raised ethical concerns, particularly regarding privacy, consent, data protection, and accountability. These datasets, often built by scraping publicly available images from the Internet, can expose children to risks such as exploitation, profiling, and tracking. Despite the growing recognition of these issues, approaches for addressing them remain limited. We explore the ethical implications of using children's images in AI datasets and propose a pipeline to detect and remove such images. As a use case, we built the pipeline on a Vision-Language Model under the Visual Question Answering task and tested it on the #PraCegoVer dataset. We also evaluate the pipeline on a subset of 100,000 images from the Open Images V7 dataset to assess its effectiveness in detecting and removing images of children. The pipeline serves as a baseline for future research, providing a starting point for more comprehensive tools and methodologies. While we leverage existing models trained on potentially problematic data, our goal is to expose and address this issue. We do not advocate for training or deploying such models, but instead call for urgent community reflection and action to protect children's rights. Ultimately, we aim to encourage the research community to exercise - more than an additional - care in creating new datasets and to inspire the development of tools to protect the fundamental rights of vulnerable groups, particularly children.


An Annotation Scheme for Factuality and its Application to Parliamentary Proceedings

arXiv.org Artificial Intelligence

Factuality assesses the extent to which a language utterance relates to real-world information; it determines whether utterances correspond to facts, possibilities, or imaginary situations, and as such, it is instrumental for fact checking. Factuality is a complex notion that relies on multiple linguistic signals, and has been studied in various disciplines. We present a complex, multi-faceted annotation scheme of factuality that combines concepts from a variety of previous works. We developed the scheme for Hebrew, but we trust that it can be adapted to other languages. We also present a set of almost 5,000 sentences in the domain of parliamentary discourse that we manually annotated according to this scheme. We report on inter-annotator agreement, and experiment with various approaches to automatically predict (some features of) the scheme, in order to extend the annotation to a large corpus.


From Fragile to Certified: Wasserstein Audits of Group Fairness Under Distribution Shift

arXiv.org Artificial Intelligence

Group-fairness metrics (e.g., equalized odds) can vary sharply across resamples and are especially brittle under distribution shift, undermining reliable audits. We propose a Wasserstein distributionally robust framework that certifies worst-case group fairness over a ball of plausible test distributions centered at the empirical law. Our formulation unifies common group fairness notions via a generic conditional-probability functional and defines $\varepsilon$-Wasserstein Distributional Fairness ($\varepsilon$-WDF) as the audit target. Leveraging strong duality, we derive tractable reformulations and an efficient estimator (DRUNE) for $\varepsilon$-WDF. We prove feasibility and consistency and establish finite-sample certification guarantees for auditing fairness, along with quantitative bounds under smoothness and margin conditions. Across standard benchmarks and classifiers, $\varepsilon$-WDF delivers stable fairness assessments under distribution shift, providing a principled basis for auditing and certifying group fairness beyond observational data.


Bubble, Bubble, AI's Rumble: Why Global Financial Regulatory Incident Reporting is Our Shield Against Systemic Stumbles

arXiv.org Artificial Intelligence

"Double, double toil and trouble; Fire burn and cauldron bubble." As Shakespeare's witches foretold chaos through cryptic prophecies, modern capital markets grapple with systemic risks concealed by opaque AI systems. According to IMF, the August 5, 2024, plunge in Japanese and U.S. equities can be linked to algorithmic trading yet absent from existing AI incidents database exemplifies this transparency crisis . Current AI incident databases, reliant on crowdsourcing or news scraping, systematically overlook capital market anomalies, particularly in algorithmic and high - frequency trading. We address this critical gap by proposing a regulatory - grade global database that elegantly synthesi s es post - trade reporting frameworks with proven incident documentation models from healthcare and aviation. Our framework's temporal data omission technique masking timestamps while preserving percentage - based metrics enables sophisticated cross - jurisdictional analysis of emerging risks while safeguarding confidential business information. Synthetic data validation ( modelled after real life published incidents, sentiments, data) (n=2,999 incidents) reveals compelling patterns: systemic risks transcending geographical boundaries, market manipulation clusters distinctly identifiable via K - means algorithms, and AI system typology exerting significantly greater influence on trading behaviour than geographical location, This tripartite solution empowers regulators with unprecedented cross - jurisdictional oversight, financial institutions with seamless compliance integration, and investors with critical visibility into previously obscured AI - driven vulnerabilities. We call for immediate action to strengthen risk management and foster resilience in AI - driven financial markets against the volatile "cauldron" of AI - driven syste m ic risks.