AITopics | legal document

Collaborating Authors

legal document

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AugAbEx : Way Forward for Extractive Case Summarization

Bindal, Purnima, Kumar, Vikas, Rathore, Sagar, Bhatnagar, Vasudha

arXiv.org Artificial IntelligenceNov-18-2025

Summarization of legal judgments poses a heavy cognitive burden on law practitioners due to the complexity of the language, context-sensitive legal jargon, and the length of the document. Therefore, the automatic summarization of legal documents has attracted serious attention from natural language processing researchers. Since the abstractive summaries of legal documents generated by deep neural methods remain prone to the risk of misrepresenting nuanced legal jargon or overlooking key contextual details, we envisage a rising trend toward the use of extractive case summarizers. Given the high cost of human annotation for gold standard extractive summaries, we engineer a light and transparent pipeline that leverages existing abstractive gold standard summaries to create the corresponding extractive gold standard versions. The approach ensures that the experts` opinions ensconced in the original gold standard abstractive summaries are carried over to the transformed extractive summaries. We aim to augment seven existing case summarization datasets, which include abstractive summaries, by incorporating corresponding extractive summaries and create an enriched data resource for case summarization research community. To ensure the quality of the augmented extractive summaries, we perform an extensive comparative evaluation with the original abstractive gold standard summaries covering structural, lexical, and semantic dimensions. We also compare the domain-level information of the two summaries. We commit to release the augmented datasets in the public domain for use by the research community and believe that the resource will offer opportunities to advance the field of automatic summarization of legal documents.

artificial intelligence, natural language, text processing, (19 more...)

arXiv.org Artificial Intelligence

2511.1229

Country: Asia > India (0.46)

Genre: Research Report (1.00)

Industry: Law (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.95)

Add feedback

LegalWiz: A Multi-Agent Generation Framework for Contradiction Detection in Legal Documents

Mantravadi, Ananya, Dalmia, Shivali, Pospelova, Olga, Mukherji, Abhishek, Dave, Nand, Mittal, Anudha

arXiv.org Artificial IntelligenceOct-14-2025

Retrieval-Augmented Generation (RAG) integrates large language models (LLMs) with external sources, but unresolved contradictions in retrieved evidence often lead to hallucinations and legally unsound outputs. Benchmarks currently used for contradiction detection lack domain realism, cover only limited conflict types, and rarely extend beyond single-sentence pairs, making them unsuitable for legal applications. Controlled generation of documents with embedded contradictions is therefore essential: it enables systematic stress-testing of models, ensures coverage of diverse conflict categories, and provides a reliable basis for evaluating contradiction detection and resolution. We present a multi-agent contradiction-aware benchmark framework for the legal domain that generates synthetic legal-style documents, injects six structured contradiction types, and models both self- and pairwise inconsistencies. Automated contradiction mining is combined with human-in-the-loop validation to guarantee plausibility and fidelity. This benchmark offers one of the first structured resources for contradiction-aware evaluation in legal RAG pipelines, supporting more consistent, interpretable, and trustworthy systems.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.03418

Country: Asia (0.46)

Genre: Research Report (1.00)

Industry: Law > Intellectual Property & Technology Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

An Senegalese Legal Texts Structuration Using LLM-augmented Knowledge Graph

Kane, Oumar, Allaya, Mouhamad M., Samb, Dame, Bousso, Mamadou

arXiv.org Artificial IntelligenceOct-6-2025

Abstract--This study examines the application of artificial intelligence (AI) and large language models (LLM) to improve access to legal texts in Senegal's judicial system. The emphasis is on the difficulties of extracting and organizing legal documents, highlighting the need for better access to judicial information. The research successfully extracted 7,967 articles from various legal documents, particularly focusing on the Land and Public Domain Code. A detailed graph database was developed, which contains 2,872 nodes and 10,774 relationships, aiding in the visualization of interconnections within legal texts. In addition, advanced triple extraction techniques were utilized for knowledge, demonstrating the effectiveness of models such as GPT - 4o, GPT -4, and Mistral-Large in identifying relationships and relevant metadata. Through these technologies, the aim is to create a solid framework that allows Senegalese citizens and legal professionals to more effectively understand their rights and responsibilities. Artificial intelligence (AI) is a transformative technology that raises significant ethical considerations regarding its use. Initiatives like Microsoft's "AI for Humanitarian Action" and Google's "AI for Social Good" focus on enhancing jurisprudence and human rights [1]. Moreover, the Center for Social Good Data Science at the University of Chicago applies AI to improve criminal justice systems.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.02353

Country:

Africa > Senegal (0.68)
North America > United States > Illinois > Cook County > Chicago (0.24)

Genre: Research Report > Experimental Study (0.34)

Industry: Law > Criminal Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Hybrid Topic-Semantic Labeling and Graph Embeddings for Unsupervised Legal Document Clustering

Bastola, Deepak, Choi, Woohyeok

arXiv.org Machine LearningSep-3-2025

Legal documents pose unique challenges for text classification due to their domain-specific language and often limited labeled data. This paper proposes a hybrid approach for classifying legal texts by combining unsupervised topic and graph embeddings with a supervised model. We employ Top2Vec to learn semantic document embeddings and automatically discover latent topics, and Node2Vec to capture structural relationships via a bipartite graph of legal documents. The embeddings are combined and clustered using KMeans, yielding coherent groupings of documents. Our computations on a legal document dataset demonstrate that the combined Top2Vec+Node2Vec approach improves clustering quality over text-only or graph-only embeddings. We conduct a sensitivity analysis of hyperparameters, such as the number of clusters and the dimensionality of the embeddings, and demonstrate that our method achieves competitive performance against baseline Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) models. Key findings indicate that while the pipeline presents an innovative approach to unsupervised legal document analysis by combining semantic topic modeling with graph embedding techniques, its efficacy is contingent upon the quality of initial topic generation and the representational power of the chosen embedding models for specialized legal language. Strategic recommendations include the exploration of domain-specific embeddings, more comprehensive hyperparameter tuning for Node2Vec, dynamic determination of cluster numbers, and robust human-in-the-loop validation processes to enhance legal relevance and trustworthiness. The pipeline demonstrates potential for exploratory legal data analysis and as a precursor to supervised learning tasks but requires further refinement and domain-specific adaptation for practical legal applications.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2509.0099

Country:

North America > United States > California (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Asia > Middle East > Jordan (0.04)
Asia > India (0.04)

Genre:

Research Report (1.00)
Overview (0.88)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

LLMs for LLMs: A Structured Prompting Methodology for Long Legal Documents

Klem, Strahinja, Moubayed, Noura Al

arXiv.org Artificial IntelligenceSep-3-2025

The rise of Large Language Models (LLMs) has had a profoundly transformative effect on a number of fields and domains. However, their uptake in Law has proven more challenging due to the important issues of reliability and transparency. In this study, we present a structured prompting methodology as a viable alternative to the often expensive fine-tuning, with the capability of tacking long legal documents from the CUAD dataset on the task of information retrieval. Each document is first split into chunks via a system of chunking and augmentation, addressing the long document problem. Then, alongside an engineered prompt, the input is fed into QWEN-2 to produce a set of answers for each question. Finally, we tackle the resulting candidate selection problem with the introduction of the Distribution-based Localisation and Inverse Cardinality Weighting heuristics. This approach leverages a general purpose model to promote long term scalability, prompt engineering to increase reliability and the two heuristic strategies to reduce the impact of the black box effect. Whilst our model performs up to 9\% better than the previously presented method, reaching state-of-the-art performance, it also highlights the limiting factor of current automatic evaluation metrics for question answering, serving as a call to action for future research. However, the chief aim of this work is to underscore the potential of structured prompt engineering as a useful, yet under-explored, tool in ensuring accountability and responsibility of AI in the legal domain, and beyond.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.02241

Country:

Europe (0.68)
Asia > China (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLMs for Law: Evaluating Legal-Specific LLMs on Contract Understanding

Singh, Amrita, Karaca, H. Suhan, Joshi, Aditya, Paik, Hye-young, Jiang, Jiaojiao

arXiv.org Artificial IntelligenceAug-12-2025

Despite advances in legal NLP, no comprehensive evaluation covering multiple legal-specific LLMs currently exists for contract classification tasks in contract understanding. To address this gap, we present an evaluation of 10 legal-specific LLMs on three English language contract understanding tasks and compare them with 7 general-purpose LLMs. The results show that legal-specific LLMs consistently outperform general-purpose models, especially on tasks requiring nuanced legal understanding. Legal-BERT and Contracts-BERT establish new SOTAs on two of the three tasks, despite having 69% fewer parameters than the best-performing general-purpose LLM. We also identify CaseLaw-BERT and LexLM as strong additional baselines for contract understanding. Our results provide a holistic evaluation of legal-specific LLMs and will facilitate the development of more accurate contract understanding systems.

artificial intelligence, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.07849

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.68)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Law > Statutes (0.97)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Segment First, Retrieve Better: Realistic Legal Search via Rhetorical Role-Based Queries

Nigam, Shubham Kumar, Dubey, Tanmay, Shallum, Noel, Bhattacharya, Arnab

arXiv.org Artificial IntelligenceAug-4-2025

Legal precedent retrieval is a cornerstone of the common law system, governed by the principle of stare decisis, which demands consistency in judicial decisions. However, the growing complexity and volume of legal documents challenge traditional retrieval methods. TraceRetriever mirrors real-world legal search by operating with limited case information, extracting only rhetorically significant segments instead of requiring complete documents. Our pipeline integrates BM25, Vector Database, and Cross-Encoder models, combining initial results through Reciprocal Rank Fusion before final re-ranking. Rhetorical annotations are generated using a Hierarchical BiLSTM CRF classifier trained on Indian judgments. Evaluated on IL-PCR and COLIEE 2025 datasets, TraceRetriever addresses growing document volume challenges while aligning with practical search constraints, reliable and scalable foundation for precedent retrieval enhancing legal research when only partial case knowledge is available.

information retrieval, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2508.00679

Country: Asia > India (0.47)

Genre: Research Report > New Finding (0.93)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.73)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Legal Document Summarization: Enhancing Judicial Efficiency through Automation Detection

Li, Yongjie, Nong, Ruilin, Liu, Jianan, Evans, Lucas

arXiv.org Artificial IntelligenceJul-28-2025

Legal document summarization represents a significant advancement towards improving judicial efficiency through the automation of key information detection. Our approach leverages state-of-the-art natural language processing techniques to meticulously identify and extract essential data from extensive legal texts, which facilitates a more efficient review process. By employing advanced machine learning algorithms, the framework recognizes underlying patterns within judicial documents to create precise summaries that encapsulate the crucial elements. This automation alleviates the burden on legal professionals, concurrently reducing the likelihood of overlooking vital information that could lead to errors. Through comprehensive experiments conducted with actual legal datasets, we demonstrate the capability of our method to generate high-quality summaries while preserving the integrity of the original content and enhancing processing times considerably. The results reveal marked improvements in operational efficiency, allowing legal practitioners to direct their efforts toward critical analytical and decision-making activities instead of manual reviews. This research highlights promising technology-driven strategies that can significantly alter workflow dynamics within the legal sector, emphasizing the role of automation in refining judicial processes.

efficiency, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.18952

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

A Data Science Approach to Calcutta High Court Judgments: An Efficient LLM and RAG-powered Framework for Summarization and Similar Cases Retrieval

Banerjee, Puspendu, Mazumdar, Aritra, Ansar, Wazib, Goswami, Saptarsi, Chakrabarti, Amlan

arXiv.org Artificial IntelligenceJul-3-2025

The judiciary, as one of democracy's three pillars, is dealing with a rising amount of legal issues, needing careful use of judicial resources. This research presents a complex framework that leverages Data Science methodologies, notably Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) techniques, to improve the efficiency of analyzing Calcutta High Court verdicts. Our framework focuses on two key aspects: first, the creation of a robust summarization mechanism that distills complex legal texts into concise and coherent summaries; and second, the development of an intelligent system for retrieving similar cases, which will assist legal professionals in research and decision making. By fine-tuning the Pegasus model using case head note summaries, we achieve significant improvements in the summarization of legal cases. Our two-step summarizing technique preserves crucial legal contexts, allowing for the production of a comprehensive vector database for RAG. The RAG-powered framework efficiently retrieves similar cases in response to user queries, offering thorough overviews and summaries. This technique not only improves legal research efficiency, but it also helps legal professionals and students easily acquire and grasp key legal information, benefiting the overall legal scenario.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.01058

Country: Asia > India > West Bengal > Kolkata (0.73)

Genre: Research Report (0.82)

Industry: Law > Government & the Courts (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The impact of fine tuning in LLaMA on hallucinations for named entity extraction in legal documentation

Vargas, Francisco, Coene, Alejandro González, Escalante, Gaston, Lobón, Exequiel, Pulido, Manuel

arXiv.org Artificial IntelligenceJun-11-2025

The extraction of information about traffic accidents from legal documents is crucial for quantifying insurance company costs. Extracting entities such as percentages of physical and/or psychological disability and the involved compensation amounts is a challenging process, even for experts, due to the subtle arguments and reasoning in the court decision. A two-step procedure is proposed: first, segmenting the document identifying the most relevant segments, and then extracting the entities. For text segmentation, two methodologies are compared: a classic method based on regular expressions and a second approach that divides the document into blocks of n-tokens, which are then vectorized using multilingual models for semantic searches (text-embedding-ada-002/MiniLM-L12-v2 ). Subsequently, large language models (LLaMA-2 7b, 70b, LLaMA-3 8b, and GPT-4 Turbo) are applied with prompting to the selected segments for entity extraction. For the LLaMA models, fine-tuning is performed using LoRA. LLaMA-2 7b, even with zero temperature, shows a significant number of hallucinations in extractions which are an important contention point for named entity extraction. This work shows that these hallucinations are substantially reduced after finetuning the model. The performance of the methodology based on segment vectorization and subsequent use of LLMs significantly surpasses the classic method which achieves an accuracy of 39.5%. Among open-source models, LLaMA-2 70B with finetuning achieves the highest accuracy 79.4%, surpassing its base version 61.7%. Notably, the base LLaMA-3 8B model already performs comparably to the finetuned LLaMA-2 70B model, achieving 76.6%, highlighting the rapid progress in model development. Meanwhile, GPT-4 Turbo achieves the highest accuracy at 86.1%.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.08827

Country: South America > Argentina (0.14)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Banking & Finance > Insurance (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback