Tran, Vu
VLSP 2023 -- LTER: A Summary of the Challenge on Legal Textual Entailment Recognition
Tran, Vu, Nguyen, Ha-Thanh, Vo, Trung, Luu, Son T., Dang, Hoang-Anh, Le, Ngoc-Cam, Le, Thi-Thuy, Nguyen, Minh-Tien, Nguyen, Truong-Son, Nguyen, Le-Minh
In this new era of rapid AI development, especially in language processing, the demand for AI in the legal domain is increasingly critical. In the context where research in other languages such as English, Japanese, and Chinese has been well-established, we introduce the first fundamental research for the Vietnamese language in the legal domain: legal textual entailment recognition through the Vietnamese Language and Speech Processing workshop. In analyzing participants' results, we discuss certain linguistic aspects critical in the legal domain that pose challenges that need to be addressed.
Encoded Summarization: Summarizing Documents into Continuous Vector Space for Legal Case Retrieval
Tran, Vu, Nguyen, Minh Le, Tojo, Satoshi, Satoh, Ken
On the other hand, we explore the benefits from combining lexical features and latent features generated with neural networks. Our experiments show that lexical features and latent features generated with neural networks complement each other to improve the retrieval system performance. Furthermore, our experimental results suggest the importance of case summarization in different aspects: using provided summaries and performing encoded summarization. Our approach achieved F1 of 65.6% and 57.6% on the experimental datasets of legal case retrieval tasks.
Law to Binary Tree -- An Formal Interpretation of Legal Natural Language
Nguyen, Ha-Thanh, Tran, Vu, Le, Ngoc-Cam, Le, Thi-Thuy, Nguyen, Quang-Huy, Nguyen, Le-Minh, Satoh, Ken
Knowledge representation and reasoning in law are essential to facilitate the automation of legal analysis and decision-making tasks. In this paper, we propose a new approach based on legal science, specifically legal taxonomy, for representing and reasoning with legal documents. Our approach interprets the regulations in legal documents as binary trees, which facilitates legal reasoning systems to make decisions and resolve logical contradictions. The advantages of this approach are twofold. First, legal reasoning can be performed on the basis of the binary tree representation of the regulations. Second, the binary tree representation of the regulations is more understandable than the existing sentence-based representations. We provide an example of how our approach can be used to interpret the regulations in a legal document.
Attentive Deep Neural Networks for Legal Document Retrieval
Nguyen, Ha-Thanh, Phi, Manh-Kien, Ngo, Xuan-Bach, Tran, Vu, Nguyen, Le-Minh, Tu, Minh-Phuong
Legal text retrieval serves as a key component in a wide range of legal text processing tasks such as legal question answering, legal case entailment, and statute law retrieval. The performance of legal text retrieval depends, to a large extent, on the representation of text, both query and legal documents. Based on good representations, a legal text retrieval model can effectively match the query to its relevant documents. Because legal documents often contain long articles and only some parts are relevant to queries, it is quite a challenge for existing models to represent such documents. In this paper, we study the use of attentive neural network-based text representation for statute law document retrieval. We propose a general approach using deep neural networks with attention mechanisms. Based on it, we develop two hierarchical architectures with sparse attention to represent long sentences and articles, and we name them Attentive CNN and Paraformer. The methods are evaluated on datasets of different sizes and characteristics in English, Japanese, and Vietnamese. Experimental results show that: i) Attentive neural methods substantially outperform non-neural methods in terms of retrieval performance across datasets and languages; ii) Pretrained transformer-based models achieve better accuracy on small datasets at the cost of high computational complexity while lighter weight Attentive CNN achieves better accuracy on large datasets; and iii) Our proposed Paraformer outperforms state-of-the-art methods on COLIEE dataset, achieving the highest recall and F2 scores in the top-N retrieval task.
Transformer-based Approaches for Legal Text Processing
Nguyen, Ha-Thanh, Nguyen, Minh-Phuong, Vuong, Thi-Hai-Yen, Bui, Minh-Quan, Nguyen, Minh-Chau, Dang, Tran-Binh, Tran, Vu, Nguyen, Le-Minh, Satoh, Ken
In this paper, we introduce our approaches using Transformer-based models for different problems of the COLIEE 2021 automatic legal text processing competition. Automated processing of legal documents is a challenging task because of the characteristics of legal documents as well as the limitation of the amount of data. With our detailed experiments, we found that Transformer-based pretrained language models can perform well with automated legal text processing problems with appropriate approaches. We describe in detail the processing steps for each task such as problem formulation, data processing and augmentation, pretraining, finetuning. In addition, we introduce to the community two pretrained models that take advantage of parallel translations in legal domain, NFSP and NMSP. In which, NFSP achieves the state-of-the-art result in Task 5 of the competition. Although the paper focuses on technical reporting, the novelty of its approaches can also be an useful reference in automated legal document processing using Transformer-based models.
ParaLaw Nets -- Cross-lingual Sentence-level Pretraining for Legal Text Processing
Nguyen, Ha-Thanh, Tran, Vu, Nguyen, Phuong Minh, Vuong, Thi-Hai-Yen, Bui, Quan Minh, Nguyen, Chau Minh, Dang, Binh Tran, Nguyen, Minh Le, Satoh, Ken
Ambiguity is a characteristic of natural language, which makes expression ideas flexible. However, in a domain that requires accurate statements, it becomes a barrier. Specifically, a single word can have many meanings and multiple words can have the same meaning. When translating a text into a foreign language, the translator needs to determine the exact meaning of each element in the original sentence to produce the correct translation sentence. From that observation, in this paper, we propose ParaLaw Nets, a pretrained model family using sentence-level cross-lingual information to reduce ambiguity and increase the performance in legal text processing. This approach achieved the best result in the Question Answering task of COLIEE-2021.