AITopics | Blair-Stanek, Andrew

Collaborating Authors

Blair-Stanek, Andrew

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LLMs Provide Unstable Answers to Legal Questions

Blair-Stanek, Andrew, Van Durme, Benjamin

arXiv.org Artificial IntelligenceJan-28-2025

An LLM is stable if it reaches the same conclusion when asked the identical question multiple times. We find leading LLMs like gpt-4o, claude-3.5, and gemini-1.5 are unstable when providing answers to hard legal questions, even when made as deterministic as possible by setting temperature to 0. We curate and release a novel dataset of 500 legal questions distilled from real cases, involving two parties, with facts, competing legal arguments, and the question of which party should prevail. When provided the exact same question, we observe that LLMs sometimes say one party should win, while other times saying the other party should win. This instability has implications for the increasing numbers of legal AI products, legal processes, and lawyers relying on these LLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.05196

Country: North America > United States > Maryland (0.14)

Genre: Research Report (1.00)

Industry:

Law > Litigation (0.94)
Law > Government & the Courts (0.69)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

OpenAI Cribbed Our Tax Example, But Can GPT-4 Really Do Tax?

Blair-Stanek, Andrew, Holzenberger, Nils, Van Durme, Benjamin

arXiv.org Artificial IntelligenceFeb-7-2024

The presenter pasted in what he called "about 16 pages' worth of tax code" These seven sentences about Alice, Bob, and Charlie come word-for-word from a handcrafted data set we developed at Johns Hopkins University and published in 2020 for training and measuring AI models for reasoning over statutory language. Every word, punctuation mark, and Maryland; Nils number in the taxpayer facts comes exactly from Holzenberger is an our tax_case_9 -- even the percent sign at the start associate professor in of the line. This work has been supported by the U.S. National Science Foundation under grant No. 2204926. The entire livestream is available at OpenAI, "GPT-4 Developer The tax law example starts at minute 19:11. Go to the directory "Cases" to find the file tax_case_9.pl. Tax_case_9.pl is written in the programming language Prolog. Federal content, please visit www.taxnotes.com. Where did the "about 16 pages' worth of tax out the TCJA standard deduction increase at code" come from? Again, from our 2020 data set. SARA has two deduction for 2018 was $24,000. From minute 20:07 to 20:40 of the livestream, handcrafted cases in SARA; tax_case_9 is one of we see some of the tax sections pasted into GPT-4. The statutes consist of nine sections of the These are SARA's heavily edited version of the IRC, For example, at and remove ambiguity. If you put all the SARA 20:23, we see part of section 63(c) with the statutes into a single file it will be about 16 pages paragraphs jumping from (3) to (5); in SARA, we long (depending on the font). At 20:26, we see part of section One of our edits was paring section 1 down to 63(c)(6) with only subparagraphs (A), (B), and (D); only sections 1(a) through (d), which contain the in SARA, we edited out (C). At 20:40, we see parts Clinton-era tax rates. We cut section 1(j), which of section 3306(b) with the paragraphs jumping contains the reduced Tax Cuts and Jobs Act rates from (2) to (7); in SARA, we edited out paragraphs for 2018-2025. This editing explains why GPT-4 (3) through (6). At 20:39 we see sections 3301 and got the wrong answer on the livestream for Alice 3306 regarding the federal unemployment tax; and Bob's 2018 taxes. We did not, however, edit while these two sections are irrelevant to Alice and Bob's tax liability in tax_case_9, they are two The author Holzenberger did all the handcrafting and hand editing. Federal content, please visit www.taxnotes.com. You can We empirically verified that using the SARA download our data set and compare it with the version of the IRC causes GPT-4 to get the wrong livestream's recording on YouTube. First, we The presenter then gives directions to GPT-4: pasted into GPT-4 all nine SARA statutes, plus our "Now calculate their total liability." GPT-4 gives facts about Alice, Bob, and Charlie. Then we detailed step-by-step calculations and concludes used the same "Now calculate their total liability" that "Alice and Bob's total tax liability for 2018 is command.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2309.09992

Country: North America > United States > Maryland (0.24)

Genre: Research Report (0.70)

Industry:

Law > Taxation Law (1.00)
Government > Tax (1.00)
Government > Regional Government > North America Government > United States Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.65)

Add feedback

InteractiveIE: Towards Assessing the Strength of Human-AI Collaboration in Improving the Performance of Information Extraction

Mondal, Ishani, Yuan, Michelle, N, Anandhavelu, Garimella, Aparna, Ferraro, Francis, Blair-Stanek, Andrew, Van Durme, Benjamin, Boyd-Graber, Jordan

arXiv.org Artificial IntelligenceNov-17-2023

Learning template based information extraction from documents is a crucial yet difficult task. Prior template-based IE approaches assume foreknowledge of the domain templates; however, real-world IE do not have pre-defined schemas and it is a figure-out-as you go phenomena. To quickly bootstrap templates in a real-world setting, we need to induce template slots from documents with zero or minimal supervision. Since the purpose of question answering intersect with the goal of information extraction, we use automatic question generation to induce template slots from the documents and investigate how a tiny amount of a proxy human-supervision on-the-fly (termed as InteractiveIE) can further boost the performance. Extensive experiments on biomedical and legal documents, where obtaining training data is expensive, reveal encouraging trends of performance improvement using InteractiveIE over AI-only baseline.

large language model, machine learning, question answering, (25 more...)

arXiv.org Artificial Intelligence

2305.14659

Country:

Asia (1.00)
North America > United States > Maryland (0.46)
North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.70)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.87)
(4 more...)

Add feedback

BLT: Can Large Language Models Handle Basic Legal Text?

Blair-Stanek, Andrew, Holzenberger, Nils, Van Durme, Benjamin

arXiv.org Artificial IntelligenceNov-16-2023

We find that the best publicly available LLMs like GPT-4 and PaLM 2 currently perform poorly at basic text handling required of lawyers or paralegals, such as looking up the text at a line of a witness deposition or at a subsection of a contract. We introduce a benchmark to quantify this poor performance, which casts into doubt LLMs' current reliability as-is for legal practice. Finetuning for these tasks brings an older LLM to near-perfect performance on our test set and also raises performance on a related legal task. This stark result highlights the need for more domain expertise in LLM training.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2311.09693

Country: North America > United States (0.93)

Genre: Research Report (0.40)

Industry: Law > Litigation (0.94)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Can GPT-3 Perform Statutory Reasoning?

Blair-Stanek, Andrew, Holzenberger, Nils, Van Durme, Benjamin

arXiv.org Artificial IntelligenceMay-10-2023

Statutory reasoning is the task of reasoning with facts and statutes, which are rules written in natural language by a legislature. It is a basic legal skill. In this paper we explore the capabilities of the most capable GPT-3 model, text-davinci-003, on an established statutory-reasoning dataset called SARA. We consider a variety of approaches, including dynamic few-shot prompting, chain-of-thought prompting, and zero-shot prompting. While we achieve results with GPT-3 that are better than the previous best published results, we also identify several types of clear errors it makes. We investigate why these errors happen. We discover that GPT-3 has imperfect prior knowledge of the actual U.S. statutes on which SARA is based. More importantly, we create simple synthetic statutes, which GPT-3 is guaranteed not to have seen during training. We find GPT-3 performs poorly at answering straightforward questions about these simple synthetic statutes.

gpt-3, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2302.061

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Government > Tax (0.47)
Government > Regional Government > North America Government > United States Government (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback