OpenAI Cribbed Our Tax Example, But Can GPT-4 Really Do Tax?
Blair-Stanek, Andrew, Holzenberger, Nils, Van Durme, Benjamin
–arXiv.org Artificial Intelligence
The presenter pasted in what he called "about 16 pages' worth of tax code" These seven sentences about Alice, Bob, and Charlie come word-for-word from a handcrafted data set we developed at Johns Hopkins University and published in 2020 for training and measuring AI models for reasoning over statutory language. Every word, punctuation mark, and Maryland; Nils number in the taxpayer facts comes exactly from Holzenberger is an our tax_case_9 -- even the percent sign at the start associate professor in of the line. This work has been supported by the U.S. National Science Foundation under grant No. 2204926. The entire livestream is available at OpenAI, "GPT-4 Developer The tax law example starts at minute 19:11. Go to the directory "Cases" to find the file tax_case_9.pl. Tax_case_9.pl is written in the programming language Prolog. Federal content, please visit www.taxnotes.com. Where did the "about 16 pages' worth of tax out the TCJA standard deduction increase at code" come from? Again, from our 2020 data set. SARA has two deduction for 2018 was $24,000. From minute 20:07 to 20:40 of the livestream, handcrafted cases in SARA; tax_case_9 is one of we see some of the tax sections pasted into GPT-4. The statutes consist of nine sections of the These are SARA's heavily edited version of the IRC, For example, at and remove ambiguity. If you put all the SARA 20:23, we see part of section 63(c) with the statutes into a single file it will be about 16 pages paragraphs jumping from (3) to (5); in SARA, we long (depending on the font). At 20:26, we see part of section One of our edits was paring section 1 down to 63(c)(6) with only subparagraphs (A), (B), and (D); only sections 1(a) through (d), which contain the in SARA, we edited out (C). At 20:40, we see parts Clinton-era tax rates. We cut section 1(j), which of section 3306(b) with the paragraphs jumping contains the reduced Tax Cuts and Jobs Act rates from (2) to (7); in SARA, we edited out paragraphs for 2018-2025. This editing explains why GPT-4 (3) through (6). At 20:39 we see sections 3301 and got the wrong answer on the livestream for Alice 3306 regarding the federal unemployment tax; and Bob's 2018 taxes. We did not, however, edit while these two sections are irrelevant to Alice and Bob's tax liability in tax_case_9, they are two The author Holzenberger did all the handcrafting and hand editing. Federal content, please visit www.taxnotes.com. You can We empirically verified that using the SARA download our data set and compare it with the version of the IRC causes GPT-4 to get the wrong livestream's recording on YouTube. First, we The presenter then gives directions to GPT-4: pasted into GPT-4 all nine SARA statutes, plus our "Now calculate their total liability." GPT-4 gives facts about Alice, Bob, and Charlie. Then we detailed step-by-step calculations and concludes used the same "Now calculate their total liability" that "Alice and Bob's total tax liability for 2018 is command.
arXiv.org Artificial Intelligence
Feb-7-2024
- Country:
- North America > United States > Maryland (0.24)
- Genre:
- Research Report (0.70)
- Industry:
- Government
- Law > Taxation Law (1.00)
- Technology: