Law
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Tripathi, Vishesh, Odapally, Tanmay, Das, Indraneel, Allu, Uday, Ahmed, Biddwan
Retrieval-Augmented Generation (RAG) systems have revolutionized information retrieval and question answering, but traditional text-based chunking methods struggle with complex document structures, multi-page tables, embedded figures, and contextual dependencies across page boundaries. We present a novel multimodal document chunking approach that leverages Large Multimodal Models (LMMs) to process PDF documents in batches while maintaining semantic coherence and structural integrity. Our method processes documents in configurable page batches with cross-batch context preservation, enabling accurate handling of tables spanning multiple pages, embedded visual elements, and procedural content. We evaluate our approach on a curated dataset of PDF documents with manually crafted queries, demonstrating improvements in chunk quality and downstream RAG performance. Our vision-guided approach achieves better accuracy compared to traditional vanilla RAG systems, with qualitative analysis showing superior preservation of document structure and semantic coherence.
Towards Reliable Forgetting: A Survey on Machine Unlearning Verification
Xue, Lulu, Hu, Shengshan, Lu, Wei, Shen, Yan, Li, Dongxu, Guo, Peijin, Zhou, Ziqi, Li, Minghui, Zhang, Yanjun, Zhang, Leo Yu
With growing demands for privacy protection, security, and legal compliance (e.g., GDPR), machine unlearning has emerged as a critical technique for ensuring the controllability and regulatory alignment of machine learning models. However, a fundamental challenge in this field lies in effectively verifying whether unlearning operations have been successfully and thoroughly executed. Despite a growing body of work on unlearning techniques, verification methodologies remain comparatively underexplored and often fragmented. Existing approaches lack a unified taxonomy and a systematic framework for evaluation. To bridge this gap, this paper presents the first structured survey of machine unlearning verification methods. We propose a taxonomy that organizes current techniques into two principal categories -- behavioral verification and parametric verification -- based on the type of evidence used to assess unlearning fidelity. We examine representative methods within each category, analyze their underlying assumptions, strengths, and limitations, and identify potential vulnerabilities in practical deployment. In closing, we articulate a set of open problems in current verification research, aiming to provide a foundation for developing more robust, efficient, and theoretically grounded unlearning verification mechanisms.
Bypassing LLM Guardrails: An Empirical Analysis of Evasion Attacks against Prompt Injection and Jailbreak Detection Systems
Hackett, William, Birch, Lewis, Trawicki, Stefan, Suri, Neeraj, Garraghan, Peter
Large Language Models (LLMs) guardrail systems are designed to protect against prompt injection and jailbreak attacks. However, they remain vulnerable to evasion techniques. We demonstrate two approaches for bypassing LLM prompt injection and jailbreak detection systems via traditional character injection methods and algorithmic Adversarial Machine Learning (AML) evasion techniques. Through testing against six prominent protection systems, including Microsoft's Azure Prompt Shield and Meta's Prompt Guard, we show that both methods can be used to evade detection while maintaining adversarial utility achieving in some instances up to 100% evasion success. Furthermore, we demonstrate that adversaries can enhance Attack Success Rates (ASR) against black-box targets by leveraging word importance ranking computed by offline white-box models. Our findings reveal vulnerabilities within current LLM protection mechanisms and highlight the need for more robust guardrail systems.
Can Group Relative Policy Optimization Improve Thai Legal Reasoning and Question Answering?
Akarajaradwong, Pawitsapak, Chaksangchaichot, Chompakorn, Pothavorn, Pirat, Thamrongrattanarit-Rutherford, Attapol, Chuangsuwanich, Ekapol, Nutanong, Sarana
The Retrieval-Augmented Generation (RAG) systems' performance on Thai legal question answering is still limited, especially for questions requiring extensive, complex legal reasoning. To address these limitations, we introduce an approach aligning LLMs toward improved law citation accuracy and better response quality using Group-Relative Policy Optimization (GRPO). Our approach leverages BGE-M3 embeddings as a cost-efficient semantic-similarity reward, significantly reducing computational expenses up to 2.5x compared to large language model judges. Experiments on the NitiBench benchmark demonstrate substantial improvements: GRPO achieves up to 90% citation-F1 gains from the base model and a 31% increase in joint quality metrics over instruction tuning. Crucially, our method shows enhanced robustness on complex legal reasoning tasks compared to instruction tuning, providing an effective and resource-efficient solution for enhancing Thai legal LLMs.
humancompatible.interconnect: Testing Properties of Repeated Uses of Interconnections of AI Systems
Nazarov, Rodion, Quinn, Anthony, Shorten, Robert, Marecek, Jakub
Artificial intelligence (AI) systems often interact with multiple agents. The regulation of such AI systems often requires that {\em a priori\/} guarantees of fairness and robustness be satisfied. With stochastic models of agents' responses to the outputs of AI systems, such {\em a priori\/} guarantees require non-trivial reasoning about the corresponding stochastic systems. Here, we present an open-source PyTorch-based toolkit for the use of stochastic control techniques in modelling interconnections of AI systems and properties of their repeated uses. It models robustness and fairness desiderata in a closed-loop fashion, and provides {\em a priori\/} guarantees for these interconnections. The PyTorch-based toolkit removes much of the complexity associated with the provision of fairness guarantees for closed-loop models of multi-agent systems.
A Taxonomy of Omnicidal Futures Involving Artificial Intelligence
Critch, Andrew, Tsimerman, Jacob
This report presents a taxonomy and examples of potential omnicidal events resulting from AI: scenarios where all or almost all humans are killed. These events are not presented as inevitable, but as possibilities that we can work to avoid. Insofar as large institutions require a degree of public support in order to take certain actions, we hope that by presenting these possibilities in public, we can help to support preventive measures against catastrophic risks from AI.
The Engineer's Dilemma: A Review of Establishing a Legal Framework for Integrating Machine Learning in Construction by Navigating Precedents and Industry Expectations
Despite the widespread interest in machine learning (ML), the engineering industry has not yet fully adopted ML-based methods, which has left engineers and stakeholders uncertain about the legal and regulatory frameworks that govern their decisions. This gap remains unaddressed as an engineer's decision-making process, typically governed by professional ethics and practical guidelines, now intersects with complex algorithmic outputs. To bridge this gap, this paper explores how engineers can navigate legal principles and legislative justifications that support and/or contest the deployment of ML technologies. Drawing on recent precedents and experiences gained from other fields, this paper argues that analogical reasoning can provide a basis for embedding ML within existing engineering codes while maintaining professional accountability and meeting safety requirements. In exploring these issues, the discussion focuses on established liability doctrines, such as negligence and product liability, and highlights how courts have evaluated the use of predictive models. We further analyze how legislative bodies and standard-setting organizations can furnish explicit guidance equivalent to prior endorsements of emergent technologies. This exploration stresses the vitality of understanding the interplay between technical justifications and legal precedents for shaping an informed stance on ML's legitimacy in engineering practice. Finally, our analysis catalyzes a legal framework for integrating ML through which stakeholders can critically assess the responsibilities, liabilities, and benefits inherent in ML-driven engineering solutions.
The Consistency-Acceptability Divergence of LLMs in Judicial Decision-Making: Task and Stakeholder Dimensions
The integration of large language model (LLM) technology into judicial systems is fundamentally transforming legal practice worldwide. However, this global transformation has revealed an urgent paradox requiring immediate attention. This study introduces the concept of ``consistency-acceptability divergence'' for the first time, referring to the gap between technical consistency and social acceptance. While LLMs achieve high consistency at the technical level, this consistency demonstrates both positive and negative effects. Through comprehensive analysis of recent data on LLM judicial applications from 2023--2025, this study finds that addressing this challenge requires understanding both task and stakeholder dimensions. This study proposes the Dual-Track Deliberative Multi-Role LLM Judicial Governance Framework (DTDMR-LJGF), which enables intelligent task classification and meaningful interaction among diverse stakeholders. This framework offers both theoretical insights and practical guidance for building an LLM judicial ecosystem that balances technical efficiency with social legitimacy.
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Qureshi, Rizwan, Sapkota, Ranjan, Shah, Abbas, Muneer, Amgad, Zafar, Anas, Vayani, Ashmal, Shoman, Maged, Eldaly, Abdelrahman B. M., Zhang, Kai, Sadak, Ferhat, Raza, Shaina, Fan, Xinqi, Shwartz-Ziv, Ravid, Yan, Hong, Jain, Vinjia, Chadha, Aman, Karkee, Manoj, Wu, Jia, Mirjalili, Seyedali
Can machines truly think, reason and act in domains like humans? This enduring question continues to shape the pursuit of Artificial General Intelligence (AGI). Despite the growing capabilities of models such as GPT-4.5, DeepSeek, Claude 3.5 Sonnet, Phi-4, and Grok 3, which exhibit multimodal fluency and partial reasoning, these systems remain fundamentally limited by their reliance on token-level prediction and lack of grounded agency. This paper offers a cross-disciplinary synthesis of AGI development, spanning artificial intelligence, cognitive neuroscience, psychology, generative models, and agent-based systems. We analyze the architectural and cognitive foundations of general intelligence, highlighting the role of modular reasoning, persistent memory, and multi-agent coordination. In particular, we emphasize the rise of Agentic RAG frameworks that combine retrieval, planning, and dynamic tool use to enable more adaptive behavior. We discuss generalization strategies, including information compression, test-time adaptation, and training-free methods, as critical pathways toward flexible, domain-agnostic intelligence. Vision-Language Models (VLMs) are reexamined not just as perception modules but as evolving interfaces for embodied understanding and collaborative task completion. We also argue that true intelligence arises not from scale alone but from the integration of memory and reasoning: an orchestration of modular, interactive, and self-improving components where compression enables adaptive behavior. Drawing on advances in neurosymbolic systems, reinforcement learning, and cognitive scaffolding, we explore how recent architectures begin to bridge the gap between statistical learning and goal-directed cognition. Finally, we identify key scientific, technical, and ethical challenges on the path to AGI.
Judges Don't Know What AI's Book Piracy Means
More than 40 lawsuits have been filed against AI companies since 2022. Late last month, there were rulings on two of these cases, first in a lawsuit against Anthropic and, two days later, in one against Meta. Both of the cases were brought by book authors who alleged that AI companies had trained large language models using authors' work without consent or compensation. In each case, the judges decided that the tech companies were engaged in "fair use" when they trained their models with authors' books. Both judges said that the use of these books was "transformative"--that training an LLM resulted in a fundamentally different product that does not directly compete with those books.