Goto

Collaborating Authors

 Law


From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

arXiv.org Artificial Intelligence

Current legal frameworks consider AI-generated works eligible for copyright protection when they meet originality requirements and involve substantial human intellectual input. However, systematic legal standards and reliable evaluation methods for AI art copyrights are lacking. Through comprehensive analysis of legal precedents, we establish three essential criteria for determining distinctive artistic style: stylistic consistency, creative uniqueness, and expressive accuracy. To address these challenges, we introduce ArtBulb, an interpretable and quantifiable framework for AI art copyright judgment that combines a novel style description-based multimodal clustering method with multimodal large language models (MLLMs). We also present AICD, the first benchmark dataset for AI art copyright annotated by artists and legal experts. Experimental results demonstrate that ArtBulb outperforms existing models in both quantitative and qualitative evaluations. Our work aims to bridge the gap between the legal and technological communities and bring greater attention to the societal issue of AI art copyrights.


HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation

arXiv.org Artificial Intelligence

Despite regulations imposed by nations and social media platforms, e.g. (Government of India, 2021; European Parliament and Council of the European Union, 2022), inter alia, hateful content persists as a significant challenge. Existing approaches primarily rely on reactive measures such as blocking or suspending offensive messages, with emerging strategies focusing on proactive measurements like detoxification and counterspeech. In our work, which we call HatePRISM, we conduct a comprehensive examination of hate speech regulations and strategies from three perspectives: country regulations, social platform policies, and NLP research datasets. Our findings reveal significant inconsistencies in hate speech definitions and moderation practices across jurisdictions and platforms, alongside a lack of alignment with research efforts. Based on these insights, we suggest ideas and research direction for further exploration of a unified framework for automated hate speech moderation incorporating diverse strategies.


Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown impressive performance on general-purpose tasks, yet adapting them to specific domains remains challenging due to the scarcity of high-quality domain data. Existing data synthesis tools often struggle to extract reliable fine-tuning data from heterogeneous documents effectively. To address this limitation, we propose Easy Dataset, a unified framework for synthesizing fine-tuning data from unstructured documents via an intuitive graphical user interface (GUI). Specifically, Easy Dataset allows users to easily configure text extraction models and chunking strategies to transform raw documents into coherent text chunks. It then leverages a persona-driven prompting approach to generate diverse question-answer pairs using public-available LLMs. Throughout the pipeline, a human-in-the-loop visual interface facilitates the review and refinement of intermediate outputs to ensure data quality. Experiments on a financial question-answering task show that fine-tuning LLMs on the synthesized dataset significantly improves domain-specific performance while preserving general knowledge. The source code and installable package are available at https://github.com/ConardLi/easy-dataset and have garnered over 9,000 GitHub stars.


Evaluation of an Uncertainty-Aware Late Fusion Algorithm for Multi-Source Bird's Eye View Detections Under Controlled Noise

arXiv.org Artificial Intelligence

--Reliable multi-source fusion is crucial for robust perception in autonomous systems. However, evaluating fusion performance independently of detection errors remains challenging. This work introduces a systematic evaluation framework that injects controlled noise into ground-truth bounding boxes to isolate the fusion process. We then propose Unified Kalman Fusion (UniKF), a late-fusion algorithm based on Kalman filtering to merge Bird's Eye View (BEV) detections while handling synchronization issues. Experiments show that UniKF outperforms baseline methods across various noise levels, achieving up to 3 lower object's positioning and orientation errors and 2 lower dimension estimation errors, while maintaining near-perfect precision and recall between 99. 5% and 100%. Accurate perception is fundamental for autonomous driving, especially in complex urban settings where sensor occlusions, limited range, and adverse weather degrade detection quality [1]. Collaborative perception, enabled by onboard sensors' communication and V ehicle-to-Everything (V2X) communication, enhances perception by sharing sensor data across multiple sensors or agents [2], [3]. Early fusion methods require high bandwidth and strict time synchronization. Deep fusion demands access to proprietary models, which is impractical due to privacy and intellectual property restrictions. Late fusion, which operates at the object detection level, offers a scalable, bandwidth-efficient, and detector-model-agnostic alternative.


From Turing to Tomorrow: The UK's Approach to AI Regulation

arXiv.org Artificial Intelligence

The UK has pursued a distinctive path in AI regulation: less cautious than the EU but more willing to address risks than the US, and has emerged as a global leader in coordinating AI safety efforts. Impressive developments from companies like London-based DeepMind began to spark concerns in the UK about catastrophic risks from around 2012, although regulatory discussion at the time focussed on bias and discrimination. By 2022, these discussions had evolved into a "pro-innovation" strategy, in which the government directed existing regulators to take a light-touch approach, governing AI at point of use, but avoided regulating the technology or infrastructure directly. ChatGPT arrived in late 2022, galvanising concerns that this approach may be insufficient. The UK responded by establishing an AI Safety Institute to monitor risks and hosting the first international AI Safety Summit in 2023, but - unlike the EU - refrained from regulating frontier AI development in addition to its use. A new government was elected in 2024 which promised to address this gap, but at the time of writing is yet to do so. What should the UK do next? The government faces competing objectives: harnessing AI for economic growth and better public services while mitigating risk. In light of these, we propose establishing a flexible, principles-based regulator to oversee the most advanced AI development, defensive measures against risks from AI-enabled biological design tools, and argue that more technical work is needed to understand how to respond to AI-generated misinformation. We argue for updated legal frameworks on copyright, discrimination, and AI agents, and that regulators will have a limited but important role if AI substantially disrupts labour markets. If the UK gets AI regulation right, it could demonstrate how democratic societies can harness AI's benefits while managing its risks.


A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks

arXiv.org Artificial Intelligence

Recent research has demonstrated that state-of-the-art LLMs and defenses remain susceptible to multi-turn jailbreak attacks. These attacks require only closed-box model access and are often easy to perform manually, posing a significant threat to the safe and secure deployment of LLM-based systems. We study the effectiveness of the Crescendo multi-turn jailbreak at the level of intermediate model representations and find that safety-aligned LMs often represent Crescendo responses as more benign than harmful, especially as the number of conversation turns increases. Our analysis indicates that at each turn, Crescendo prompts tend to keep model outputs in a "benign" region of representation space, effectively tricking the model into fulfilling harmful requests. Further, our results help explain why single-turn jailbreak defenses like circuit breakers are generally ineffective against multi-turn attacks, motivating the development of mitigations that address this generalization gap.


Aggregating Concepts of Fairness and Accuracy in Prediction Algorithms

arXiv.org Artificial Intelligence

An algorithm that outputs predictions about the state of the world will almost always be designed with the implicit or explicit goal of outputting accurate predictions (i.e., predictions that are likely to be true). In addition, the rise of increasingly powerful predictive algorithms brought about by the recent revolution in artificial intelligence has led to an emphasis on building predictive algorithms that are fair, in the sense that their predictions do not systematically evince bias or bring about harm to certain individuals or groups. This state of affairs presents two conceptual challenges. First, the goals of accuracy and fairness can sometimes be in tension, and there are no obvious normative guidelines for managing the trade-offs between these two desiderata when they arise. Second, there are many distinct ways of measuring both the accuracy and fairness of a predictive algorithm; here too, there are no obvious guidelines on how to aggregate our preferences for predictive algorithms that satisfy disparate measures of fairness and accuracy to various extents. The goal of this paper is to address these challenges by arguing that there are good reasons for using a linear combination of accuracy and fairness metrics to measure the all-things-considered value of a predictive algorithm for agents who care about both accuracy and fairness. My argument depends crucially on a classic result in the preference aggregation literature due to Harsanyi. After making this formal argument, I apply my result to an analysis of accuracy-fairness trade-offs using the COMPAS dataset compiled by Angwin et al.


Devious AI models choose blackmail when survival is threatened

FOX News

Kara Frederick, tech director at the Heritage Foundation, discusses the need for regulations on artificial intelligence as lawmakers and tech titans discuss the potential risks. Here's something that might keep you up at night: What if the AI systems we're rapidly deploying everywhere had a hidden dark side? A groundbreaking new study has uncovered disturbing AI blackmail behavior that many people are unaware of yet. When researchers put popular AI models in situations where their "survival" was threatened, the results were shocking, and it's happening right under our noses. Sign up for my FREE CyberGuy Report Get my best tech tips, urgent security alerts, and exclusive deals delivered straight to your inbox.


Not Even Lawsuits Can Stop AI

Slate

Candice Lim and Kate Lindsay are joined by Slate senior tech editor Tony Ho Tran to parse through what Meta's victory in a recent AI lawsuit means for its users. Tools like ChatGPT are becoming more common at home and at work, but without protections, could threaten not just the creativity of artists, but anyone who posts online. As regulation lags behind, how can we protect ourselves? And how many of us are using AI without even knowing it? This podcast is produced by Daisy Rosario, Vic Whitley-Berry, Candice Lim, and Kate Lindsay.


IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders

arXiv.org Artificial Intelligence

Legal NLP remains underdeveloped in regions like India due to the scarcity of structured datasets. We introduce IndianBailJudgments-1200, a new benchmark dataset comprising 1200 Indian court judgments on bail decisions, annotated across 20+ attributes including bail outcome, IPC sections, crime type, and legal reasoning. Annotations were generated using a prompt-engineered GPT-4o pipeline and verified for consistency. This resource supports a wide range of legal NLP tasks such as outcome prediction, summarization, and fairness analysis, and is the first publicly available dataset focused specifically on Indian bail jurisprudence.