Law
Is Trump the end of the international rules-based order?
After more than a year of Israeli bombing, tens of thousands of Palestinian deaths, and a humanitarian catastrophe in Gaza, the world was largely united in saying "enough is enough". United Nations General Assembly (UNGA) resolution 12667 in December was clear in its demand: An immediate ceasefire in Gaza. Countries as diverse as Vietnam, Zimbabwe and Colombia echoed that call. And yet, bucking that consensus were nine "no" votes – chief among them, as is typical when it comes to resolutions calling for Israel to adhere to international law or human rights, was the United States. The US has provided unwavering support to Israel throughout its war on Gaza, even as Israel faces accusations of genocide at the International Court of Justice (ICJ) and its prime minister has an International Criminal Court (ICC) arrest warrant to his name.
Pareidolic Illusions of Meaning: ChatGPT, Pseudolaw and the Triumph of Form over Substance
The early 2020s has seen the rise of two strange and potentially quite impactful social phenomena, namely pseudolaw, where users rely upon pseudolegal arguments that mimic the form and ritual of legal argumentation but fundamentally distort the content of law, and generative AI/LLMs, which generate content that uses probabilistic calculations to create outputs that look like human generated text. This article argues that the juxtaposition of the two phenomena helps to reveal that they both share two fundamental traits as both elevate form and appearance over substance and content, and users of both routinely mistake the form for the substance. In drawing upon legal theory, computer science, linguistics and cognitive psychology, the article argues that both phenomena rely upon creating illusions of meaning that users mistake for the underlying primary phenomenon. I then explore four implications of this conception of both phenomena. Firstly, both rely on human tendencies of conceptual pareidolia resulting in the erroneous perception of meaningful linguistic legal patterns from nebulous inputs. Secondly, both rely upon the confidence heuristic, the human cognitive bias for treating confidence as a proxy for competence. Thirdly, both succeed when the primary concern is with the form of the output and not its content. Fourthly, both rely heavily upon the magical thinking of users and the desire for the promise of the approach to be real. The article argues that the legal context helps to reveal a solution for the problems caused by both phenomena as it is only where users possess sufficient legal and technological literacy that it becomes possible to reveal to them the illusionary nature of the phenomena.
FedGAI: Federated Style Learning with Cloud-Edge Collaboration for Generative AI in Fashion Design
Wu, Mingzhu, Jiang, Jianan, Li, Xinglin, Deng, Hanhui, Wu, Di
Collaboration can amalgamate diverse ideas, styles, and visual elements, fostering creativity and innovation among different designers. In collaborative design, sketches play a pivotal role as a means of expressing design creativity. However, designers often tend to not openly share these meticulously crafted sketches. This phenomenon of data island in the design area hinders its digital transformation under the third wave of AI. In this paper, we introduce a Federated Generative Artificial Intelligence Clothing system, namely FedGAI, employing federated learning to aid in sketch design. FedGAI is committed to establishing an ecosystem wherein designers can exchange sketch styles among themselves. Through FedGAI, designers can generate sketches that incorporate various designers' styles from their peers, drawing inspiration from collaboration without the need for data disclosure or upload. Extensive performance evaluations indicate that our FedGAI system can produce multi-styled sketches of comparable quality to human-designed ones while significantly enhancing efficiency compared to hand-drawn sketches.
Synthetic Data for Robust AI Model Development in Regulated Enterprises
In today's business landscape, organizations need to find the right balance between using their customers' data ethically to power AI solutions and being compliant regarding data privacy and data usage regulations. In this paper, we discuss synthetic data as a possible solution to this dilemma. Synthetic data is simulated data that mimics the real data. We explore how organizations in heavily regulated industries, such as financial institutions or healthcare organizations, can leverage synthetic data to build robust AI solutions while staying compliant. We demonstrate that synthetic data offers two significant advantages by allowing AI models to learn from more diverse data and by helping organizations stay compliant against data privacy laws with the use of synthetic data instead of customer information. We discuss case studies to show how synthetic data can be effectively used in the finance and healthcare sector while discussing the challenges of using synthetic data and some ethical questions it raises. Our research finds that synthetic data could be a game-changer for AI in regulated industries. The potential can be realized when industry, academia, and regulators collaborate to build solutions. We aim to initiate discussions on the use of synthetic data to build ethical, responsible, and effective AI systems in regulated enterprise industries.
Using LLMs for Automated Privacy Policy Analysis: Prompt Engineering, Fine-Tuning and Explainability
Chen, Yuxin, Tang, Peng, Qiu, Weidong, Li, Shujun
Privacy policies are widely used by digital services and often required for legal purposes. Many machine learning based classifiers have been developed to automate detection of different concepts in a given privacy policy, which can help facilitate other automated tasks such as producing a more reader-friendly summary and detecting legal compliance issues. Despite the successful applications of large language models (LLMs) to many NLP tasks in various domains, there is very little work studying the use of LLMs for automated privacy policy analysis, therefore, if and how LLMs can help automate privacy policy analysis remains under-explored. To fill this research gap, we conducted a comprehensive evaluation of LLM-based privacy policy concept classifiers, employing both prompt engineering and LoRA (low-rank adaptation) fine-tuning, on four state-of-the-art (SOTA) privacy policy corpora and taxonomies. Our experimental results demonstrated that combining prompt engineering and fine-tuning can make LLM-based classifiers outperform other SOTA methods, \emph{significantly} and \emph{consistently} across privacy policy corpora/taxonomies and concepts. Furthermore, we evaluated the explainability of the LLM-based classifiers using three metrics: completeness, logicality, and comprehensibility. For all three metrics, a score exceeding 91.1\% was observed in our evaluation, indicating that LLMs are not only useful to improve the classification performance, but also to enhance the explainability of detection results.
AI Agents: Evolution, Architecture, and Real-World Applications
Artificial Intelligence (AI) has evolved dramatically over the past decade, transitioning from specialized systems designed for narrow tasks to increasingly sophisticated architectures capable of autonomous operation across diverse domains. Among these advancements, AI agents represent a particularly significant development, embodying a paradigm shift in how intelligent systems interact with their environments, make decisions, and achieve complex goals. Unlike traditional AI systems that execute predefined algorithms within constraints, AI agents possess the capacity to autonomously perceive, reason, and act, often adapting their behavior based on environmental feedback and accumulated experience. The concept of an AI agent refers to a system or program that is capable of autonomously performing tasks on behalf of a user or another system by designing its workflow and utilizing available tools. These agents can encompass a wide range of functionalities beyond natural language processing, including decision making, problem solving, interacting with external environments, and executing actions. As Kapoor et al. (2024) note in their analysis of agent benchmarks, the development of AI agents represents an exciting new research direction with significant implications for real-world applications across numerous industries. The evolution of AI agents has been accelerated by recent breakthroughs in large language models (LLMs), which have provided a foundation for more sophisticated reasoning capabilities. Modern AI agents leverage these advanced language models as core components, augmenting them with specialized modules for memory, planning, tool use, and environmental interaction. This integration enables agents to perform complex tasks that would be challenging or impossible for traditional AI systems, from reconciling financial statements to providing step-by-step instructions for field technicians based on contextual understanding of product information.
Chasing the Timber Trail: Machine Learning to Reveal Harvest Location Misrepresentation
Sarkar, Shailik, Yousuf, Raquib Bin, Wang, Linhan, Mayer, Brian, Mortier, Thomas, Deklerck, Victor, Truszkowski, Jakub, Simeone, John C., Norman, Marigold, Saunders, Jade, Lu, Chang-Tien, Ramakrishnan, Naren
Illegal logging poses a significant threat to global biodiversity, climate stability, and depresses international prices for legal wood harvesting and responsible forest products trade, affecting livelihoods and communities across the globe. Stable isotope ratio analysis (SIRA) is rapidly becoming an important tool for determining the harvest location of traded, organic, products. The spatial pattern in stable isotope ratio values depends on factors such as atmospheric and environmental conditions and can thus be used for geographic origin identification. We present here the results of a deployed machine learning pipeline where we leverage both isotope values and atmospheric variables to determine timber harvest location. Additionally, the pipeline incorporates uncertainty estimation to facilitate the interpretation of harvest location determination for analysts. We present our experiments on a collection of oak (Quercus spp.) tree samples from its global range. Our pipeline outperforms comparable state-of-the-art models determining geographic harvest origin of commercially traded wood products, and has been used by European enforcement agencies to identify harvest location misrepresentation. We also identify opportunities for further advancement of our framework and how it can be generalized to help identify the origin of falsely labeled organic products throughout the supply chain.
Interpretation Gaps in LLM-Assisted Comprehension of Privacy Documents
This article explores the gaps that can manifest when using a large language model (LLM) to obtain simplified interpretations of data practices from a complex privacy policy. We exemplify these gaps to showcase issues in accuracy, completeness, clarity and representation, while advocating for continued research to realize an LLM's true potential in revolutionizing privacy management through personal assistants and automated compliance checking.
Augmented Adversarial Trigger Learning
Gradient optimization-based adversarial attack methods automate the learning of adversarial triggers to generate jailbreak prompts or leak system prompts. In this work, we take a closer look at the optimization objective of adversarial trigger learning and propose ATLA: Adversarial Trigger Learning with Augmented objectives. ATLA improves the negative log-likelihood loss used by previous studies into a weighted loss formulation that encourages the learned adversarial triggers to optimize more towards response format tokens. This enables ATLA to learn an adversarial trigger from just one query-response pair and the learned trigger generalizes well to other similar queries. We further design a variation to augment trigger optimization with an auxiliary loss that suppresses evasive responses. We showcase how to use ATLA to learn adversarial suffixes jailbreaking LLMs and to extract hidden system prompts. Empirically we demonstrate that ATLA consistently outperforms current state-of-the-art techniques, achieving nearly 100% success in attacking while requiring 80% fewer queries. ATLA learned jailbreak suffixes demonstrate high generalization to unseen queries and transfer well to new LLMs.
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models
Ravichander, Abhilasha, Fisher, Jillian, Sorensen, Taylor, Lu, Ximing, Lin, Yuchen, Antoniak, Maria, Mireshghallah, Niloofar, Bhagavatula, Chandra, Choi, Yejin
High-quality training data has proven crucial for developing performant large language models (LLMs). However, commercial LLM providers disclose few, if any, details about the data used for training. This lack of transparency creates multiple challenges: it limits external oversight and inspection of LLMs for issues such as copyright infringement, it undermines the agency of data authors, and it hinders scientific research on critical issues such as data contamination and data selection. How can we recover what training data is known to LLMs? In this work, we demonstrate a new method to identify training data known to proprietary LLMs like GPT-4 without requiring any access to model weights or token probabilities, by using information-guided probes. Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes. By evaluating a model's ability to successfully reconstruct high-surprisal tokens in text, we can identify a surprising number of texts memorized by LLMs.