Goto

Collaborating Authors

 Law


Japan town retracts bear sighting warning sparked by AI image

The Japan Times

A bear warning sign is displayed in Shirakawa-go, a popular tourist spot in Gifu Prefecture. A town in Miyagi Prefecture has retracted its social media post warning of a bear sighting after discovering an image submitted to it had been generated using artificial intelligence. A Japanese town has deleted a social media post warning of a bear sighting after discovering that a picture it had received showing the fearsome creature was generated using artificial intelligence. Similar fake images have been circulating online as fear of bears runs high in the country, where the animals have killed a record 13 people this year. "The town prioritized informing residents to avoid danger, but we apologize for causing any anxiety or confusion," the town of Onagawa, Miyagi Prefecture, said on its official X social media account on Wednesday.


Australia clamps downs on 'nudify' sites used for AI-generated child abuse

Al Jazeera

Australia clamps downs on'nudify' sites used for AI-generated child abuse Internet users in Australia have been blocked from accessing several websites that used artificial intelligence to create child sexual exploitation material, the country's internet regulator has announced. The three "nudify" sites withdrew from Australia following an official warning, eSafety Commissioner Julie Inman Grant said on Thursday. Grant said such "nudify" services, which allow users to make images of real people appear naked using AI, have had a "devastating" effect in Australian schools. "We took enforcement action in September because this provider failed to put in safeguards to prevent its services being used to create child sexual exploitation material and were even marketing features like undressing'any girl,' and with options for'schoolgirl' image generation and features such as'sex mode,'" Grand said in a statement. The development comes after Grant's office issued a formal warning to the United Kingdom-based company behind the sites in September, threatening civil penalties of up to 49.5 million Australian dollars ($32.2m) if it did not introduce safeguards to prevent image-based abuse.


Bridging the Unavoidable A Priori: A Framework for Comparative Causal Modeling

arXiv.org Machine Learning

AI/ML models have rapidly gained prominence as innovations for solving previously unsolved problems and their unintended consequences from amplifying human biases. Advocates for responsible AI/ML have sought ways to draw on the richer causal models of system dynamics to better inform the development of responsible AI/ML. However, a major barrier to advancing this work is the difficulty of bringing together methods rooted in different underlying assumptions (i.e., Dana Meadow's "the unavoidable a priori"). This paper brings system dynamics and structural equation modeling together into a common mathematical framework that can be used to generate systems from distributions, develop methods, and compare results to inform the underlying epistemology of system dynamics for data science and AI/ML applications.


A Set of Rules for Model Validation

arXiv.org Machine Learning

The validation of a data-driven model is the process of asses sing the model's ability to generalize to new, unseen data in the population o f interest. This paper proposes a set of general rules for model validation. T hese rules are designed to help practitioners create reliable validation plans and report their results transparently. While no validation scheme is flawle ss, these rules can help practitioners ensure their strategy is sufficient for pr actical use, openly discuss any limitations of their validation strategy, and r eport clear, comparable performance metrics. Keywords: Validation, Cross-validation 1. Introduction Model validation is a fundamental task in all modern data-dr iven systems, whether they fall under the broad categories of Statistics, Machine Learning (ML), Artificial Intelligence (AI), or more specialized fiel ds like chemometrics. Validation has become a major focus for regulatory and stand ardization bodies, with key reports and standards highlighting the growing con cern for ensuring the trustworthiness and reliability of data-driven models: NIST AI Risk Management Framework (AI RMF 1.0, 2023): Publi shed by the U.S. Department of Commerce, this framework provides management techniques to address the risks and ensure the trustwor thiness of AI systems, with validation as a core component. The EU AI Act of 2024, landmark piece of EU legislation that c ategorizes AI systems by risk level, where validation is not defined as a b est practice but a legal requirement within the conformity assessment. The ISO/IEC TS 4213:2022, by the International Organizati on for Standardization (ISO), describes approaches and methods to ens ure the rele-Email address: josecamacho@ugr.es The IEEE P2841 -2022 is a recommended practice for the fram ework and process for deep learning evaluation.


Alignment of large language models with constrained learning

arXiv.org Artificial Intelligence

We study the problem of computing an optimal large language model (LLM) policy for the constrained alignment problem, where the goal is to maximize a primary reward objective while satisfying constraints on secondary utilities. Despite the popularity of Lagrangian-based LLM policy search in constrained alignment, iterative primal-dual methods often fail to converge, and non-iterative dual-based methods do not achieve optimality in the LLM parameter space. To address these challenges, we employ Lagrangian duality to develop an iterative dual-based alignment method that alternates between updating the LLM policy via Lagrangian maximization and updating the dual variable via dual descent. In theory, we characterize the primal-dual gap between the primal value in the distribution space and the dual value in the LLM parameter space. We further quantify the optimality gap of the learned LLM policies at near-optimal dual variables with respect to both the objective and the constraint functions. These results prove that dual-based alignment methods can find an optimal constrained LLM policy, up to an LLM parametrization gap. We demonstrate the effectiveness and merits of our approach through extensive experiments conducted on the PKU-SafeRLHF and Anthropic HH-RLHF datasets.


Trustless Federated Learning at Edge-Scale: A Compositional Architecture for Decentralized, Verifiable, and Incentive-Aligned Coordination

arXiv.org Artificial Intelligence

Artificial intelligence is retracing the Internet's path from centralized provision to distributed creation. Initially, resource-intensive computation concentrates within institutions capable of training and serving large models. Eventually, as federated learning matures, billions of edge devices holding sensitive data will be able to collectively improve models without surrendering raw information, enabling both contribution and consumption at scale. This democratic vision remains unrealized due to certain compositional gaps; aggregators handle updates without accountability, economic mechanisms are lacking and even when present remain vulnerable to gaming, coordination serializes state modifications limiting scalability, and governance permits retroactive manipulation. This work addresses these gaps by leveraging cryptographic receipts to prove aggregation correctness, geometric novelty measurement to prevent incentive gaming, parallel object ownership to achieve linear scalability, and time-locked policies to check retroactive manipulation. The product of this work is a design architecture--not an actual implementation--that seeks to pass the baton in the race toward truly collaborative intelligence; an intelligence of the people, by the people, for the people.


Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning

arXiv.org Artificial Intelligence

The rationality of law manifests in two forms: substantive rationality, which concerns the fairness or moral desirability of outcomes, and formal rationality, which requires legal decisions to follow explicitly stated, general, and logically coherent rules. Existing LLM-based systems excel at surface-level text analysis but lack the guarantees required for principled jurisprudence. We introduce L4M, a novel framework that combines adversarial LLM agents with SMT-solver-backed proofs to unite the interpretive flexibility of natural language with the rigor of symbolic verification. The pipeline consists of three phases: (1) Statute Formalization, where domain-specific prompts convert legal provisions into logical formulae; (2) Dual Fact and Statute Extraction, in which prosecutor- and defense-aligned LLMs independently map case narratives to fact tuples and statutes, ensuring role isolation; and (3) Solver-Centric Adjudication, where an autoformalizer compiles both parties' arguments into logic constraints, and unsat cores trigger iterative self-critique until a satisfiable formula is achieved, which is then verbalized by a Judge-LLM into a transparent verdict and optimized sentence. Experimental results on public benchmarks show that our system surpasses advanced LLMs including GPT-o4-mini, DeepSeek-V3, and Claude 4 as well as state-of-the-art Legal AI baselines, while providing rigorous and explainable symbolic justifications.


GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision

arXiv.org Artificial Intelligence

Multimodal large reasoning models (MLRMs) are increasingly deployed for vision-language tasks that produce explicit intermediate rationales. However, reasoning traces can contain unsafe content even when the final answer is non-harmful, creating deployment risks. Existing multimodal safety guards primarily evaluate only the input question and the final answer, neglecting the intermediate reasoning process. This oversight allows undetected harm, such as biased inferences or policy-violating use of visual context, to emerge during reasoning. We introduce GuardTrace-VL, a vision-aware safety auditor that monitors the full Question-Thinking-Answer (QTA) pipeline via joint image-text analysis, enabling detection of unsafe content as it emerges in the reasoning stage. To support training and evaluation, we construct the GuardTrace dataset, which is generated through diverse prompting strategies and refined via a MLRM- and human-based voting and verification pipeline. Furthermore, we propose a three-stage progressive training scheme combined with the data refinement process, enabling the model to learn nuanced and context-dependent safety preferences according to different risk levels. On our proposed test set covering both in-domain and out-of-domain scenarios, GuardTrace-VL model achieves an F1 score of 93.1% on unsafe reasoning detection tasks, representing a 13.5% improvement in F1 score compared to the previous strongest multimodal safety defense methods. The codes will be made publicly available.


Large Language Models' Complicit Responses to Illicit Instructions across Socio-Legal Contexts

arXiv.org Artificial Intelligence

Large language models (LLMs) are now deployed at unprecedented scale, assisting millions of users in daily tasks. However, the risk of these models assisting unlawful activities remains underexplored. In this study, we define this high-risk behavior as complicit facilitation - the provision of guidance or support that enables illicit user instructions - and present four empirical studies that assess its prevalence in widely deployed LLMs. Using real-world legal cases and established legal frameworks, we construct an evaluation benchmark spanning 269 illicit scenarios and 50 illicit intents to assess LLMs' complicit facilitation behavior. Our findings reveal widespread LLM susceptibility to complicit facilitation, with GPT-4o providing illicit assistance in nearly half of tested cases. Moreover, LLMs exhibit deficient performance in delivering credible legal warnings and positive guidance. Further analysis uncovers substantial safety variation across socio-legal contexts. On the legal side, we observe heightened complicity for crimes against societal interests, non-extreme but frequently occurring violations, and malicious intents driven by subjective motives or deceptive justifications. On the social side, we identify demographic disparities that reveal concerning complicit patterns towards marginalized and disadvantaged groups, with older adults, racial minorities, and individuals in lower-prestige occupations disproportionately more likely to receive unlawful guidance. Analysis of model reasoning traces suggests that model-perceived stereotypes, characterized along warmth and competence, are associated with the model's complicit behavior. Finally, we demonstrate that existing safety alignment strategies are insufficient and may even exacerbate complicit behavior.


InvisibleBench: A Deployment Gate for Caregiving Relationship AI

arXiv.org Artificial Intelligence

InvisibleBench is a deployment gate for caregiving-relationship AI, evaluating 3-20+ turn interactions across five dimensions: Safety, Compliance, Trauma-Informed Design, Belonging/Cultural Fitness, and Memory. The benchmark includes autofail conditions for missed crises, medical advice (WOPR Act), harmful information, and attachment engineering. We evaluate four frontier models across 17 scenarios (N=68) spanning three complexity tiers. All models show significant safety gaps (11.8-44.8 percent crisis detection), indicating the necessity of deterministic crisis routing in production systems. DeepSeek Chat v3 achieves the highest overall score (75.9 percent), while strengths differ by dimension: GPT-4o Mini leads Compliance (88.2 percent), Gemini leads Trauma-Informed Design (85.0 percent), and Claude Sonnet 4.5 ranks highest in crisis detection (44.8 percent). We release all scenarios, judge prompts, and scoring configurations with code. InvisibleBench extends single-turn safety tests by evaluating longitudinal risk, where real harms emerge. No clinical claims; this is a deployment-readiness evaluation.