Goto

Collaborating Authors

 Westlock County


Large Language Models in Legislative Content Analysis: A Dataset from the Polish Parliament

arXiv.org Artificial Intelligence

Large language models (LLMs) are among the best methods for processing natural language, partly due to their versatility. At the same time, domain-specific LLMs are more practical in real-life applications. This work introduces a novel natural language dataset created by acquired data from official legislative authorities' websites. The study focuses on formulating three natural language processing (NLP) tasks to evaluate the effectiveness of LLMs on legislative content analysis within the context of the Polish legal system. Key findings highlight the potential of LLMs in automating and enhancing legislative content analysis while emphasizing specific challenges, such as understanding legal context. The research contributes to the advancement of NLP in the legal field, particularly in the Polish language. It has been demonstrated that even commonly accessible data can be practically utilized for legislative content analysis.


CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation

arXiv.org Artificial Intelligence

Legal case documents play a critical role in judicial proceedings. As the number of cases continues to rise, the reliance on manual drafting of legal case documents is facing increasing pressure and challenges. The development of large language models (LLMs) offers a promising solution for automating document generation. However, existing benchmarks fail to fully capture the complexities involved in drafting legal case documents in real-world scenarios. To address this gap, we introduce CaseGen, the benchmark for multi-stage legal case documents generation in the Chinese legal domain. CaseGen is based on 500 real case samples annotated by legal experts and covers seven essential case sections. It supports four key tasks: drafting defense statements, writing trial facts, composing legal reasoning, and generating judgment results. To the best of our knowledge, CaseGen is the first benchmark designed to evaluate LLMs in the context of legal case document generation. To ensure an accurate and comprehensive evaluation, we design the LLM-as-a-judge evaluation framework and validate its effectiveness through human annotations. We evaluate several widely used general-domain LLMs and legal-specific LLMs, highlighting their limitations in case document generation and pinpointing areas for potential improvement. This work marks a step toward a more effective framework for automating legal case documents drafting, paving the way for the reliable application of AI in the legal field. The dataset and code are publicly available at https://github.com/CSHaitao/CaseGen.


LegalBench.PT: A Benchmark for Portuguese Law

arXiv.org Artificial Intelligence

The recent application of LLMs to the legal field has spurred the creation of benchmarks across various jurisdictions and languages. However, no benchmark has yet been specifically designed for the Portuguese legal system. In this work, we present LegalBench.PT, the first comprehensive legal benchmark covering key areas of Portuguese law. To develop LegalBench.PT, we first collect long-form questions and answers from real law exams, and then use GPT-4o to convert them into multiple-choice, true/false, and matching formats. Once generated, the questions are filtered and processed to improve the quality of the dataset. To ensure accuracy and relevance, we validate our approach by having a legal professional review a sample of the generated questions. Although the questions are synthetically generated, we show that their basis in human-created exams and our rigorous filtering and processing methods applied result in a reliable benchmark for assessing LLMs' legal knowledge and reasoning abilities. Finally, we evaluate the performance of leading LLMs on LegalBench.PT and investigate potential biases in GPT-4o's responses. We also assess the performance of Portuguese lawyers on a sample of questions to establish a baseline for model comparison and validate the benchmark.


Legal Evalutions and Challenges of Large Language Models

arXiv.org Artificial Intelligence

In this paper, we review legal testing methods based on Large Language Models (LLMs), using the OPENAI o1 model as a case study to evaluate the performance of large models in applying legal provisions. We compare current state-of-the-art LLMs, including open-source, closed-source, and legal-specific models trained specifically for the legal domain. Systematic tests are conducted on English and Chinese legal cases, and the results are analyzed in depth. Through systematic testing of legal cases from common law systems and China, this paper explores the strengths and weaknesses of LLMs in understanding and applying legal texts, reasoning through legal issues, and predicting judgments. The experimental results highlight both the potential and limitations of LLMs in legal applications, particularly in terms of challenges related to the interpretation of legal language and the accuracy of legal reasoning. Finally, the paper provides a comprehensive analysis of the advantages and disadvantages of various types of models, offering valuable insights and references for the future application of AI in the legal field.


The Use of Readability Metrics in Legal Text: A Systematic Literature Review

arXiv.org Artificial Intelligence

Understanding the text in legal documents can be challenging due to their complex structure and the inclusion of domain-specific jargon. Laws and regulations are often crafted in such a manner that engagement with them requires formal training, potentially leading to vastly different interpretations of the same texts. Linguistic complexity is an important contributor to the difficulties experienced by readers. Simplifying texts could enhance comprehension across a broader audience, not just among trained professionals. Various metrics have been developed to measure document readability. Therefore, we adopted a systematic review approach to examine the linguistic and readability metrics currently employed for legal and regulatory texts. A total of 3566 initial papers were screened, with 34 relevant studies found and further assessed. Our primary objective was to identify which current metrics were applied for evaluating readability within the legal field. Sixteen different metrics were identified, with the Flesch-Kincaid Grade Level being the most frequently used method. The majority of studies (73.5%) were found in the domain of "informed consent forms". From the analysis, it is clear that not all legal domains are well represented in terms of readability metrics and that there is a further need to develop more consensus on which metrics should be applied for legal documents.


Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges

arXiv.org Artificial Intelligence

Natural Language Processing is revolutionizing the way legal professionals and laypersons operate in the legal field. The considerable potential for Natural Language Processing in the legal sector, especially in developing computational tools for various legal processes, has captured the interest of researchers for years. This survey follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses framework, reviewing 148 studies, with a final selection of 127 after manual filtering. It explores foundational concepts related to Natural Language Processing in the legal domain, illustrating the unique aspects and challenges of processing legal texts, such as extensive document length, complex language, and limited open legal datasets. We provide an overview of Natural Language Processing tasks specific to legal text, such as Legal Document Summarization, legal Named Entity Recognition, Legal Question Answering, Legal Text Classification, and Legal Judgment Prediction. In the section on legal Language Models, we analyze both developed Language Models and approaches for adapting general Language Models to the legal domain. Additionally, we identify 15 Open Research Challenges, including bias in Artificial Intelligence applications, the need for more robust and interpretable models, and improving explainability to handle the complexities of legal language and reasoning.


On Ambiguity and the Expressive Function of Law: The Role of Pragmatics in Smart Legal Ecosystems

arXiv.org Artificial Intelligence

This is a long paper, an essay, on ambiguity, pragmatics, legal ecosystems, and the expressive function of law. It is divided into two parts and fifteen sections. The first part (Pragmatics) addresses ambiguity from the perspective of linguistic and cognitive pragmatics in the legal field. The second part (Computing) deals with this issue from the point of view of human-centered design and artificial intelligence, specifically focusing on the notion and modelling of rules and what it means to comply with the rules. This is necessary for the scaffolding of smart legal ecosystems (SLE). I will develop this subject with the example of the architecture, information flows, and smart ecosystem of OPTIMAI, an EU project of Industry 4.0 for zero-defect manufacturing (Optimizing Manufacturing Processes through Artificial Intelligence and Virtualization).


Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model

arXiv.org Artificial Intelligence

With the development of large-scale Language Models (LLM), fine-tuning pre-trained LLM has become a mainstream paradigm for solving downstream tasks of natural language processing. However, training a language model in the legal field requires a large number of legal documents so that the language model can learn legal terminology and the particularity of the format of legal documents. The typical NLP approaches usually rely on many manually annotated data sets for training. However, in the legal field application, it is difficult to obtain a large number of manually annotated data sets, which restricts the typical method applied to the task of drafting legal documents. The experimental results of this paper show that not only can we leverage a large number of annotation-free legal documents without Chinese word segmentation to fine-tune a large-scale language model, but more importantly, it can fine-tune a pre-trained LLM on the local computer to achieve the generating legal document drafts task, and at the same time achieve the protection of information privacy and to improve information security issues. NTRODUCTION In recent years, researchers have applied neural networks to natural language processing, achieving state-of-the-art performance in processing legal documents, such as tasks related to textual entailment [1] and legal question answering [2]. With the development of large-scale Language Models (LLM), fine-tuning pretrained LLM to address natural language processing tasks, as mentioned above, has become a mainstream paradigm [3]. However, challenges persist when employing natural language text generation techniques as a solution for highly specialized tasks like legal document drafting.


Gender Bias Detection in Court Decisions: A Brazilian Case Study

arXiv.org Artificial Intelligence

Data derived from the realm of the social sciences is often produced in digital text form, which motivates its use as a source for natural language processing methods. Researchers and practitioners have developed and relied on artificial intelligence techniques to collect, process, and analyze documents in the legal field, especially for tasks such as text summarization and classification. While increasing procedural efficiency is often the primary motivation behind natural language processing in the field, several works have proposed solutions for human rights-related issues, such as assessment of public policy and institutional social settings. One such issue is the presence of gender biases in court decisions, which has been largely studied in social sciences fields; biased institutional responses to gender-based violence are a violation of international human rights dispositions since they prevent gender minorities from accessing rights and hamper their dignity. Natural language processing-based approaches can help detect these biases on a larger scale. Still, the development and use of such tools require researchers and practitioners to be mindful of legal and ethical aspects concerning data sharing and use, reproducibility, domain expertise, and value-charged choices. In this work, we (a) present an experimental framework developed to automatically detect gender biases in court decisions issued in Brazilian Portuguese and (b) describe and elaborate on features we identify to be critical in such a technology, given its proposed use as a support tool for research and assessment of court~activity.


Judges in England, Wales approved for limited, cautious AI use: 'Can't hold back the floodgates'

FOX News

Judges in England and Wales will have approval for "careful use" of artificial intelligence (AI) to help produce rulings, but experts remain divided over how extensively judges or the wider law profession should seek to use the technology. "I would say AI is probably appropriate to cast a wide net to gather as much information as possible," William A. Jacobson, a Cornell University Law professor and founder of the Equal Protection Project, told Fox News Digital. "That might inform your decision, but I don't think it is at a place now – and I don't know if it ever will be – that it can actually do the sorting … and make the sort of decisions and determinations that you need to make, whether it's as a judge or a lawyer," Jacobson said. The Courts and Tribunals Judiciary, the body of various judges, magistrates, tribunal members and coroners in England and Wales, decided that judges may use AI to write opinions, and only opinions, with no leeway to use the technology for research or legal analyses due to the potential for AI to fabricate information and provide misleading, inaccurate and biased information. Caution over AI's use in the legal field partially stems from a few high-profile blunders that resulted from lawyers experimenting with the tech, which produced court filings that included references to fictional cases, known as "hallucinations."