Goto

Collaborating Authors

 Law


Mind the (Belief) Gap: Group Identity in the World of LLMs

arXiv.org Artificial Intelligence

Social biases and belief-driven behaviors can significantly impact Large Language Models (LLMs) decisions on several tasks. As LLMs are increasingly used in multi-agent systems for societal simulations, their ability to model fundamental group psychological characteristics remains critical yet under-explored. In this study, we present a multi-agent framework that simulates belief congruence, a classical group psychology theory that plays a crucial role in shaping societal interactions and preferences. Our findings reveal that LLMs exhibit amplified belief congruence compared to humans, across diverse contexts. We further investigate the implications of this behavior on two downstream tasks: (1) misinformation dissemination and (2) LLM learning, finding that belief congruence in LLMs increases misinformation dissemination and impedes learning. To mitigate these negative impacts, we propose strategies inspired by: (1) contact hypothesis, (2) accuracy nudges, and (3) global citizenship framework. Our results show that the best strategies reduce misinformation dissemination by up to 37% and enhance learning by 11%. Bridging social psychology and AI, our work provides insights to navigate real-world interactions using LLMs while addressing belief-driven biases.


Zero-Trust Artificial Intelligence Model Security Based on Moving Target Defense and Content Disarm and Reconstruction

arXiv.org Artificial Intelligence

--This paper examines the challenges in distributing AI models through model zoos and file transfer mechanisms. Despite advancements in security measures, vulnerabilities persist, necessitating a multi-layered approach to mitigate risks effectively. The physical security of model files is critical, requiring stringent access controls and attack prevention solutions. This paper proposes a novel solution architecture composed of two prevention approaches. The first is Content Disarm and Reconstruction (CDR), which focuses on disarming serialization attacks that enable attackers to run malicious code as soon as the model is loaded. The second is protecting the model architecture and weights from attacks by using Moving T arget Defense (MTD), alerting the model structure, and providing verification steps to detect such attacks. The paper focuses on the highly exploitable Pickle and PyT orch file formats. It demonstrates a 100% disarm rate while validated against known AI model repositories and actual malware attacks from the HuggingFace model zoo. The swift evolution of Artificial Intelligence (AI) technology has made it a top priority for cybercriminals looking to obtain confidential information and intellectual property. These malicious individuals may try to exploit AI systems for their own gain, using specialized tactics alongside conventional IT methods. Given the broad spectrum of potential attack strategies, safeguards must be extensive. Experienced attackers frequently employ a combination of techniques to execute more intricate operations, which can render layered defenses ineffective. While adversarial AI model security [1, 2], privacy [3] and operational security aspects of AI receive much attention [4, 5], it's equally important to address the physical file security aspects of AI models.


Detecting Stylistic Fingerprints of Large Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) have distinct and consistent stylistic fingerprints, even when prompted to write in different writing styles. Detecting these fingerprints is important for many reasons, among them protecting intellectual property, ensuring transparency regarding AI-generated content, and preventing the misuse of AI technologies. In this paper, we present a novel method to classify texts based on the stylistic fingerprints of the models that generated them. We introduce an LLM-detection ensemble that is composed of three classifiers with varied architectures and training data. This ensemble is trained to classify texts generated by four well-known LLM families: Claude, Gemini, Llama, and OpenAI. As this task is highly cost-sensitive and might have severe implications, we want to minimize false-positives and increase confidence. We consider a prediction as valid when all three classifiers in the ensemble unanimously agree on the output classification. Our ensemble is validated on a test set of texts generated by Claude, Gemini, Llama, and OpenAI models, and achieves extremely high precision (0.9988) and a very low false-positive rate (0.0004). Furthermore, we demonstrate the ensemble's ability to distinguish between texts generated by seen and unseen models. This reveals interesting stylistic relationships between models. This approach to stylistic analysis has implications for verifying the originality of AI-generated texts and tracking the origins of model training techniques.


Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations

arXiv.org Artificial Intelligence

Commercial content moderation APIs are marketed as scalable solutions to combat online hate speech. However, the reliance on these APIs risks both silencing legitimate speech, called over-moderation, and failing to protect online platforms from harmful speech, known as under-moderation. To assess such risks, this paper introduces a framework for auditing black-box NLP systems. Using the framework, we systematically evaluate five widely used commercial content moderation APIs. Analyzing five million queries based on four datasets, we find that APIs frequently rely on group identity terms, such as ``black'', to predict hate speech. While OpenAI's and Amazon's services perform slightly better, all providers under-moderate implicit hate speech, which uses codified messages, especially against LGBTQIA+ individuals. Simultaneously, they over-moderate counter-speech, reclaimed slurs and content related to Black, LGBTQIA+, Jewish, and Muslim people. We recommend that API providers offer better guidance on API implementation and threshold setting and more transparency on their APIs' limitations. Warning: This paper contains offensive and hateful terms and concepts. We have chosen to reproduce these terms for reasons of transparency.


None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering

arXiv.org Artificial Intelligence

Multiple-choice exam questions with "None of the above" (NA) options have been extensively studied in educational testing, in which existing research suggests that they better assess true knowledge. However, their impact on Large Language Models (LLMs) evaluation remains underexplored. Through systematic experiments with 28 LLMs on the MMLU benchmark, we examine how NA options affect model performance and confidence calibration. Our analysis reveals that NA options, when used as the correct answer, lead to a consistent 30-50\% performance drop across models regardless of scale--suggesting that LLMs lack the meta-cognitive ability to systematically evaluate and reject all given options when none are correct. This degradation shows strong domain dependence, with minimal impact on mathematical reasoning (14.6\% drop) but severe effects on tasks requiring uncertainty handling like business ethics (48.1\% drop). Our results highlight important implications for benchmark design and raise questions about LLMs' ability to handle uncertainty in real-world applications.


Entailment vs. Verification for Partial-assignment Satisfiability and Enumeration

arXiv.org Artificial Intelligence

Many procedures for SA T -related problems, in particular for those requiring the complete enumeration of satisfying truth assignments, rely their efficiency and effectiveness on the detection of (possibly small) partial assignments satisfying an input formula. Surprisingly, there seems to be no unique universally-agreed definition of formula satisfaction by a partial assignment in the literature. In this paper we analyze in deep the issue of satisfaction by partial assignments, raising a flag about some ambiguities and subtleties of this concept, and investigating their practical consequences. We identify two alternative notions that are implicitly used in the literature, namely verification and entailment, which coincide if applied to CNF formulas but differ and present complementary properties if applied to non-CNF or to existentially-quantified formulas. We show that, although the former is easier to check and as such is implicitly used by most current search procedures, the latter has better theoretical properties, and can improve the efficiency and effectiveness of enumeration procedures. 1 Introduction Motivations.


SwiLTra-Bench: The Swiss Legal Translation Benchmark

arXiv.org Artificial Intelligence

In Switzerland legal translation is uniquely important due to the country's four official languages and requirements for multilingual legal documentation. However, this process traditionally relies on professionals who must be both legal experts and skilled translators -- creating bottlenecks and impacting effective access to justice. To address this challenge, we introduce SwiLTra-Bench, a comprehensive multilingual benchmark of over 180K aligned Swiss legal translation pairs comprising laws, headnotes, and press releases across all Swiss languages along with English, designed to evaluate LLM-based translation systems. Our systematic evaluation reveals that frontier models achieve superior translation performance across all document types, while specialized translation systems excel specifically in laws but under-perform in headnotes. Through rigorous testing and human expert validation, we demonstrate that while fine-tuning open SLMs significantly improves their translation quality, they still lag behind the best zero-shot prompted frontier models such as Claude-3.5-Sonnet. Additionally, we present SwiLTra-Judge, a specialized LLM evaluation system that aligns best with human expert assessments.


PEO: Improving Bi-Factorial Preference Alignment with Post-Training Policy Extrapolation

arXiv.org Artificial Intelligence

The alignment of large language models with human values presents a critical challenge, particularly when balancing conflicting objectives like helpfulness and harmlessness. Existing approaches, such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO), face notable limitations: RLHF suffers from instability and inefficiency in multi-objective optimization, while DPO lacks mechanisms for dynamic trade-offs. To address these challenges, we propose Post-Training Extrapolation Optimization (PEO), a novel and efficient framework for bi-factorial alignment. PEO generates a family of Pareto-optimal policies in a single training pass by leveraging a three-phase pipeline: (1) aspect-specific learning, (2) generalist initialization via interpolation, and (3) post-training optimization via extrapolation. PEO enables dynamic adaptation to diverse user preferences at inference time without retraining. Our comprehensive experiments across multiple LLMs demonstrate that PEO achieves superior Pareto fronts compared to baselines, offering improved flexibility and computational efficiency. Theoretical analyses further highlight PEO's capacity to overcome optimization bottlenecks, paving the way for scalable, personalized alignment.


Understanding Dynamic Diffusion Process of LLM-based Agents under Information Asymmetry

arXiv.org Artificial Intelligence

Large language models have been used to simulate human society using multi-agent systems. Most current social simulation research emphasizes interactive behaviors in fixed environments, ignoring information opacity, relationship variability and diffusion diversity. In this paper, we study the dynamics of information diffusion in 12 asymmetric open environments defined by information content and distribution mechanisms. We first present a general framework to capture the features of information diffusion. Then, we designed a dynamic attention mechanism to help agents allocate attention to different information, addressing the limitations of LLM-based attention. Agents start by responding to external information stimuli within a five-agent group, increasing group size and forming information circles while developing relationships and sharing information. Additionally, we observe the emergence of information cocoons, the evolution of information gaps, and the accumulation of social capital, which are closely linked to psychological, sociological, and communication theories.


Transgender cult leader linked to border agent killing maintains innocence, asks for vegan food in jail

FOX News

Post Millennial senior editor Andy Ngo unpacks what led to the arrests of members of an apparent transgender vegan cult on'The Ingraham Angle.' The apparent head of a radical transgender cult linked to six killings, including a U.S. Border Patrol agent, told a Maryland judge last week, "I haven't done anything wrong" while pleading for access to vegan food behind bars. "I might starve to death if you cannot answer me," Jack Amadeus LaSota, 34, who goes by "Ziz," told Judge Erich Bean during a bail hearing in Allegany County District Court in Maryland on Feb. 18, according to audio obtained by the San Francisco Chronicle. "I need the jail to be ordered for me to have a vegan diet. It's more important than whatever this hearing is."