Goto

Collaborating Authors

 ediscovery


Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery

Lahiri, Sounak, Pai, Sumit, Weninger, Tim, Bhattacharya, Sanmitra

arXiv.org Artificial Intelligence

Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of artificial intelligence (AI) and natural language processing (NLP) has transformed this process, helping document review and enhance efficiency and cost-effectiveness. Although traditional approaches like BM25 or fine-tuned pre-trained models are common in eDiscovery, they face performance, computational, and interpretability challenges. In contrast, Large Language Model (LLM)-based methods prioritize interpretability but sacrifice performance and throughput. This paper introduces DISCOvery Graph (DISCOG), a hybrid approach that combines the strengths of two worlds: a heterogeneous graph-based method for accurate document relevance prediction and subsequent LLM-driven approach for reasoning. Graph representational learning generates embeddings and predicts links, ranking the corpus for a given request, and the LLMs provide reasoning for document relevance. Our approach handles datasets with balanced and imbalanced distributions, outperforming baselines in F1-score, precision, and recall by an average of 12%, 3%, and 16%, respectively. In an enterprise context, our approach drastically reduces document review costs by 99.9% compared to manual processes and by 95% compared to LLM-based classification methods


Opening the TAR Black Box: Developing an Interpretable System for eDiscovery Using the Fuzzy ARTMAP Neural Network

Courchaine, Charles, Sethi, Ricky J.

arXiv.org Artificial Intelligence

RCV1-v2 and Jeb Bush emails corpora are frequently used in e-Technology-assisted review (TAR) utilizes an information retrieval discovery evaluations [20, 22] because legal matters are often confidential system to discover all, or nearly all, the relevant documents in a [7, 9] and their corpora are unavailable. The 20Newsgroups corpus and help reduce the human effort required to find these documents corpus is commonly used as a test corpus with ART-based algorithms [7, 9, 20]. TAR systems are employed in high-recall tasks [18, 19]; it and the Reuters-21578 corpus are also commonly such as e-discovery, systematic literature reviews, evidence-based used in evaluating text classification algorithms [1].


Magic and Hallucinations? Considering ChatGPT in eDiscovery

#artificialintelligence

Question: Can you comment on early usage of GPT in eDiscovery platforms? Does the Hallucination factor limit its use in the Enterprise where defensibility is critical? Answer: One vendor announced an early beta use just before LegalWeek and provided a video demo. The vendor did not directly reveal what approach they were using to support question-answering on an e-discovery corpus (such as the Enron corpus they use in the demo). Thus we could not tell whether they were training the model on the eDiscovery corpus directly to create a "private" model or whether they were taking Bing's "search-then-synthesize" approach of using the question to search for documents, then have GPT read the documents and answer the question based on this reading.


Reveal Expands Into South Korea with New Intellectual Data Partnership

#artificialintelligence

Reveal-Brainspace announced that Intellectual Data, an eDiscovery service provider in Korea, will be integrating Reveal's AI-powered eDiscovery, review & investigations platform – Reveal 11 – onto its suite of enterprise cloud services for legal and corporate entities throughout the region. Specifically, Intellectual Data will leverage Reveal's end-to-end, SaaS-based platform to offer eDiscovery hosting, business process optimization and consulting services to its clients – all underpinned by advanced AI and machine learning technology. "As the first partner of Reveal in Korea, we look forward to collaborating on evolving the service to mitigate the risk factors blocking global growth of our clients." "Korea is a lynchpin in Reveal's strategic growth initiative in the APAC region, which makes the partnership with one of Korea's most respected eDiscovery service providers even more significant," said Wendell Jisa, CEO of Reveal. "We're thrilled to work with the talented team at Intellectual Data to provide to expand the reach of our Reveal 11 platform to the growing network of law firms and corporations in Korea looking to solve their most complex challenges with leading AI and review technology."


Law Firms of All Sizes Can Easily Integrate AI Tools Into eDiscovery

#artificialintelligence

Artificial intelligence tools have become prevalent in legal practice, particularly in eDiscovery. That doesn't mean, however, that law firms and litigation support teams have been quick to embrace them. Despite their benefits, many legal organizations have been hesitant to implement AI tools. In the ABA 2020 Legal Tech Survey, 23% of law firms reported not being interested in AI, while 34% said they didn't know enough about AI to speak to their firms' interest. While the survey showed that larger firms were more likely to adopt AI tools, that leaves a lot of room for smaller firms to use AI to their advantage.


The Use Of Artificial Intelligence In eDiscovery

#artificialintelligence

Editor's Note: As an industry leader in the use of artificial intelligence to empower cyber discovery and legal discovery efforts, HaystackID is excited to share this new information paper from the EDRM and to highlight the participation of HaystackID eDiscovery expert Matt Sinner as a contributor to this important educational effort. We also strongly support the educational and standardization initiatives of the EDRM and continue to be a proud partner of theirs as they empower the leaders of eDiscovery. Originally published by the Electronic Discovery Reference Model (EDRM). Please see full Publication below for more information.

  Industry: Law > Litigation (1.00)

EDRM Announces Publication of "The Use of Artificial Intelligence in eDiscovery"

#artificialintelligence

MINNEAPOLIS, October 18, 2021 – Setting the global standards for e-discovery, the Electronic Discovery Reference Model (EDRM) is pleased to announce the release of its artificial intelligence (AI) paper titled "The Use of Artificial Intelligence in eDiscovery." Kelly Atherton, senior manager cyber incident response at Norton Rose Fulbright, served as the project trustee. Our drafting team comprised of attorneys, data scientists and legal technologists sought to create an objective, easy to digest overview for the bench and bar to aid them in better understanding the use of AI in e-discovery. "We are inundated in the e-discovery space with broad talk of technologies that will help us perform our work more efficiently and accurately at a lower cost. But what does this all even mean?" asks Atherton. "Our drafting team comprised of attorneys, data scientists and legal technologists sought to create an objective, easy to digest overview for the bench and bar to aid them in better understanding the use of AI in e-discovery. We adopted a broad, working definition of AI for the purpose of this paper and discussed the types of AI used in e-discovery, common uses cases and ethical considerations. Our hope is those new to AI can use this paper as a starting point to become a more informed consumer and adopter of AI in e-discovery."


Nuix and H5 Announce Strategic Partnership to Streamline Classification of Corporate Data

#artificialintelligence

H5 announced that it has teamed up with Nuix to integrate its document classification solutions with the market-leading Nuix processing engine. This strategic partnership will allow corporations to gain greater control of their data, prioritize downstream review and reduce the risks associated with sending data outside of the organization. Starting with the identification of privileged content and personally identifiable information (PII), this partnership enables H5 to expand its ability to identify and classify such documents behind the corporate firewall. Protecting sensitive data is business critical for many corporations driven in part by the rise of new regulatory requirements, data breaches and continued complexity in eDiscovery. However, for many corporations finding and categorizing PII and privileged data in the context of eDiscovery is a headache filled with manual processes and workarounds.


e-Discovery and Artificial Intelligence

#artificialintelligence

Events unfold and you are dropped into the opening of a long and complex case with 500,000 emails to sift through and you're not even sure what you are looking for, who you are looking for, or when any incidents of interest may have occurred. Currently the review of documents is the most labour-intensive task of an e-discovery investigation often consuming more than 75% of the project budget. This is largely because researchers review the documents manually. To put this into context, to review half a million documents by hand, at 25 documents an hour, would take around 20,000 person-hours. Hence, because it is practically impossible to review all documents in the target corpus by hand, results are too often limited by simple keyword searches.


e-Discovery and Artificial Intelligence

#artificialintelligence

Ahead of the latest episode in the Boyes Turner tech podcast series, Prof J.Mark Bishop shares his thoughts on'e-Discovery and Artificial Intelligence... Events unfold and you are dropped into the opening of a long and complex case with 500,000 emails to sift through and you're not even sure what you are looking for, who you are looking for, or when any incidents of interest may have occurred. Currently the review of documents is the most labour-intensive task of an e-discovery investigation often consuming more than 75% of the project budget. This is largely because researchers review the documents manually. To put this into context, to review half a million documents by hand, at 25 documents an hour, would take around 20,000 person-hours. Hence, because it is practically impossible to review all documents in the target corpus by hand, results are too often limited by simple keyword searches. Unfortunately coming up with responsive keywords is not trivial as a researcher often does not know exactly what she is looking for beforehand.