Goto

Collaborating Authors

 Law


IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property

arXiv.org Artificial Intelligence

Intellectual Property (IP) is a highly specialized domain that integrates technical and legal knowledge, making it inherently complex and knowledge-intensive. Recent advancements in LLMs have demonstrated their potential to handle IP-related tasks, enabling more efficient analysis, understanding, and generation of IP-related content. However, existing datasets and benchmarks focus narrowly on patents or cover limited aspects of the IP field, lacking alignment with real-world scenarios. To bridge this gap, we introduce IPBench, the first comprehensive IP task taxonomy and a large-scale bilingual benchmark encompassing 8 IP mechanisms and 20 distinct tasks, designed to evaluate LLMs in real-world IP scenarios. We benchmark 17 main LLMs, ranging from general purpose to domain-specific, including chat-oriented and reasoning-focused models, under zero-shot, few-shot, and chain-of-thought settings. Our results show that even the top-performing model, DeepSeek-V3, achieves only 75.8% accuracy, indicating significant room for improvement. Notably, open-source IP and law-oriented models lag behind closed-source general-purpose models. To foster future research, we publicly release IPBench, and will expand it with additional tasks to better reflect real-world complexities and support model advancements in the IP domain. We provide the data and code in the supplementary URLs.


Assessing Visual Privacy Risks in Multimodal AI: A Novel Taxonomy-Grounded Evaluation of Vision-Language Models

arXiv.org Artificial Intelligence

Artificial Intelligence have profoundly transformed the technological landscape in recent years. Large Language Models (LLMs) have demonstrated impressive abilities in reasoning, text comprehension, contextual pattern recognition, and integrating language with visual understanding. While these advances offer significant benefits, they also reveal critical limitations in the models' ability to grasp the notion of privacy. There is hence substantial interest in determining if and how these models can understand and enforce privacy principles, particularly given the lack of supporting resources to test such a task. In this work, we address these challenges by examining how legal frameworks can inform the capabilities of these emerging technologies. T o this end, we introduce a comprehensive, multilevel Visual Privacy T axonomy that captures a wide range of privacy issues, designed to be scalable and adaptable to existing and future research needs. Furthermore, we evaluate the capabilities of several state-of-the-art Vision-Language Models (VLMs), revealing significant inconsistencies in their understanding of contextual privacy. Our work contributes both a foundational taxonomy for future research and a critical benchmark of current model limitations, demonstrating the urgent need for more robust, privacy-aware AI systems.


Fostering Robots: A Governance-First Conceptual Framework for Domestic, Curriculum-Based Trajectory Collection

arXiv.org Artificial Intelligence

We propose a conceptual, empirically testable framework for Robot Fostering, -a curriculum-driven, governance-first approach to domestic robot deployments, emphasizing long-term, curated interaction trajectories. We formalize trajectory quality with quantifiable metrics and evaluation protocols aligned with EU-grade governance standards, delineating a low-resource empirical roadmap to enable rigorous validation through future pilot studies.


Falcon: A Cross-Modal Evaluation Dataset for Comprehensive Safety Perception

arXiv.org Artificial Intelligence

Existing methods for evaluating the harmfulness of content generated by large language models (LLMs) have been well studied. However, approaches tailored to multimodal large language models (MLLMs) remain underdeveloped and lack depth. This work highlights the crucial role of visual information in moderating content in visual question answering (VQA), a dimension often overlooked in current research. To bridge this gap, we introduce Falcon, a large-scale vision-language safety dataset containing 57,515 VQA pairs across 13 harm categories. The dataset provides explicit annotations for harmful attributes across images, instructions, and responses, thereby facilitating a comprehensive evaluation of the content generated by MLLMs. In addition, it includes the relevant harm categories along with explanations supporting the corresponding judgments. We further propose FalconEye, a specialized evaluator fine-tuned from Qwen2.5-VL-7B using the Falcon dataset. Experimental results demonstrate that FalconEye reliably identifies harmful content in complex and safety-critical multimodal dialogue scenarios. It outperforms all other baselines in overall accuracy across our proposed Falcon-test dataset and two widely-used benchmarks-VLGuard and Beavertail-V, underscoring its potential as a practical safety auditing tool for MLLMs.


PSG-Agent: Personality-Aware Safety Guardrail for LLM-based Agents

arXiv.org Artificial Intelligence

Effective guardrails are essential for safely deploying LLM-based agents in critical applications. Despite recent advances, existing guardrails suffer from two fundamental limitations: (i) they apply uniform guardrail policies to all users, ignoring that the same agent behavior can harm some users while being safe for others; (ii) they check each response in isolation, missing how risks evolve and accumulate across multiple interactions. To solve these issues, we propose PSG-Agent, a personalized and dynamic system for LLM-based agents. First, PSG-Agent creates personalized guardrails by mining the interaction history for stable traits and capturing real-time states from current queries, generating user-specific risk thresholds and protection strategies. Second, PSG-Agent implements continuous monitoring across the agent pipeline with specialized guards, including Plan Monitor, Tool Firewall, Response Guard, Memory Guardian, that track cross-turn risk accumulation and issue verifiable verdicts. Finally, we validate PSG-Agent in multiple scenarios including healthcare, finance, and daily life automation scenarios with diverse user profiles. It significantly outperform existing agent guardrails including LlamaGuard3 and AGrail, providing an executable and auditable path toward personalized safety for LLM-based agents.


ML-Asset Management: Curation, Discovery, and Utilization

arXiv.org Artificial Intelligence

Machine learning (ML) assets, such as models, datasets, and metadata, are central to modern ML workflows. Despite their explosive growth in practice, these assets are often underutilized due to fragmented documentation, siloed storage, inconsistent licensing, and lack of unified discovery mechanisms, making ML-asset management an urgent challenge. This tutorial offers a comprehensive overview of ML-asset management activities across its lifecycle, including curation, discovery, and utilization. We provide a categorization of ML assets, and major management issues, survey state-of-the-art techniques, and identify emerging opportunities at each stage. We further highlight system-level challenges related to scalability, lineage, and unified indexing. Through live demonstrations of systems, this tutorial equips both researchers and practitioners with actionable insights and practical tools for advancing ML-asset management in real-world and domain-specific settings.


Dynamic Trust Calibration Using Contextual Bandits

arXiv.org Artificial Intelligence

Trust calibration between humans and Artificial Intelligence (AI) is crucial for optimal decision-making in collaborative settings. Excessive trust can lead users to accept AI-generated outputs without question, overlooking critical flaws, while insufficient trust may result in disregarding valuable insights from AI systems, hindering performance. Despite its importance, there is currently no definitive and objective method for measuring trust calibration between humans and AI. Current approaches lack standardization and consistent metrics that can be broadly applied across various contexts, and they don't distinguish between the formation of opinions and subsequent human decisions. In this work, we propose a novel and objective method for dynamic trust calibration, introducing a standardized trust calibration measure and an indicator. By utilizing Contextual Bandits-an adaptive algorithm that incorporates context into decision-making-our indicator dynamically assesses when to trust AI contributions based on learned contextual information. We evaluate this indicator across three diverse datasets, demonstrating that effective trust calibration results in significant improvements in decision-making performance, as evidenced by 10 to 38% increase in reward metrics. These findings not only enhance theoretical understanding but also provide practical guidance for developing more trustworthy AI systems supporting decisions in critical domains, for example, disease diagnoses and criminal justice.


Guard Vector: Beyond English LLM Guardrails with Task-Vector Composition and Streaming-Aware Prefix SFT

arXiv.org Artificial Intelligence

We introduce Guard Vector, a safety task vector computed as the parameter difference between a guardrail model (Guard Model) and a same-architecture pretrained language model. Composing this vector with a target language model yields a Target Guard Model (TGM). We then adapt TGM with a streaming-aware approach that combines prefix-based training and evaluation with a classifier that produces a single-token output. With this composition alone, TGM improves classification quality over established Guard Models across standard safety suites and enables language extensibility to Chinese, Japanese, and Korean, requiring neither additional training nor target language labels. It also demonstrates model portability across two widely used public guardrail backbones, Llama and Gemma. With prefix SFT (supervised fine-tuning), TGM preserves classification quality under streaming by aligning the behavior between prefix inputs and full-text inputs. The single-token output design increases throughput and reduces latency. Together, these components reduce data and compute requirements while promoting streaming-aware evaluation practices, thereby contributing to a more responsible AI ecosystem.


Dual-Space Smoothness for Robust and Balanced LLM Unlearning

arXiv.org Artificial Intelligence

However, given limited time and computational resources, retraining LLMs to mitigate the influence of undesired data is impractical. Thus, Machine Unlearning (MU) emerges as an alternative solution to weaken a model's performance on undesired knowledge (Liu et al., 2024b; Eldan & Russinovich, 2023) while preserving the model's original utility (Liu et al., 2025). Though much research shed light on MU, several recent studies indicate that MU still lacks robustness (Zhang et al., 2024c; Y uan et al., 2025; Lee et al., 2025). In particular, they are susceptible to both jailbreak attacks (Zou et al., 2023; Andriushchenko et al., 2024) and relearning attacks (Hu et al., 2024). Such limitations can be exploited through reusing small amount of unlearned knowledge (Hu et al., 2024) or adversarial prompt manipulations, including prefix injection (Andriushchenko et al., 2024) and adaptive jailbreaks (Liu et al., 2023). These attacks act as small perturbations in parameter or representation space, driving the model along directions that yield undesired content that should have been forgotten (Fan et al., 2025; Lin et al., 2024b). Smoothness Minimization can be introduced to enhance model robustness against these attacks by promoting a smooth loss across the neighborhood (Foret et al., 2020; Fan et al., 2025). Plenty of studies have sought to remove undesired data to improve the effectiveness of MU and these approaches have demonstrated substantial unlearning performance (Liu et al., 2022a; Thudi et al., 2022; Zou et al., 2024; Pawelczyk et al., 2023; Liu et al., 2024a). However, they still suffer from limitations in robustness and trade-off between model utility and unlearning effectiveness.


Scaling Policy Compliance Assessment in Language Models with Policy Reasoning Traces

arXiv.org Artificial Intelligence

Policy compliance assessment is a fundamental task of evaluating whether an input case strictly complies with a set of human-defined rules, more generally known as policies. In practice, human experts follow a systematic, step-by-step process to identify violations with respect to specific stipulations outlined in the policy. However, such documentation of gold-standard, expert-level reasoning processes is costly to acquire. In this paper, we introduce Policy Reasoning Traces (PRT), a form of specialized generated reasoning chains that serve as a reasoning bridge to improve an LLM's policy compliance assessment capabilities. Our empirical evaluations demonstrate that the use of PRTs for both inference-time and training-time scenarios significantly enhances the performance of open-weight and commercial models, setting a new state-of-the-art for HIPAA and GDPR policies. Beyond accuracy gains, we also highlight how PRTs can improve an LLM's ability to accurately cite policy clauses, as well as influence compliance decisions through their high utilization from the raw chains of thought.