AITopics

doi: 10.1007/s10032-025-00530-0

2512.09666

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Data Science > Data Mining > Text Mining (0.72)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Yashwant, Sai, Dubey, Anurag, Paikray, Praneeth, Thulsiram, Gantala

Invoice Information Extraction: Methods and Performance Evaluation

arXiv.org Artificial IntelligenceOct-23-2025

This paper presents methods for extracting structured information from invoice documents and proposes a set of evaluation metrics (EM) to assess the accuracy of the extracted data against annotated ground truth. The approach involves pre-processing scanned or digital invoices, applying Docling and LlamaCloud Services to identify and extract key fields such as invoice number, date, total amount, and vendor details. To ensure the reliability of the extraction process, we establish a robust evaluation framework comprising field-level precision, consistency check failures, and exact match accuracy. The proposed metrics provide a standardized way to compare different extraction methods and highlight strengths and weaknesses in field-specific performance.

large language model, machine learning, natural language, (15 more...)

2510.15727

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

arXiv.org Artificial IntelligenceMay-30-2024

Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use

Cesista, Franz Louis, Aguiar, Rui, Kim, Jason, Acilo, Paolo

Business Document Information Extraction (BDIE) is the problem of transforming a blob of unstructured information (raw text, scanned documents, etc.) into a structured format that downstream systems can parse and use. It has two main tasks: Key-Information Extraction (KIE) and Line Items Recognition (LIR). In this paper, we argue that BDIE is best modeled as a Tool Use problem, where the tools are these downstream systems. We then present Retrieval Augmented Structured Generation (RASG), a novel general framework for BDIE that achieves state of the art (SOTA) results on both KIE and LIR tasks on BDIE benchmarks. The contributions of this paper are threefold: (1) We show, with ablation benchmarks, that Large Language Models (LLMs) with RASG are already competitive with or surpasses current SOTA Large Multimodal Models (LMMs) without RASG on BDIE benchmarks. (2) We propose a new metric class for Line Items Recognition, General Line Items Recognition Metric (GLIRM), that is more aligned with practical BDIE use cases compared to existing metrics, such as ANLS*, DocILE, and GriTS. (3) We provide a heuristic algorithm for backcalculating bounding boxes of predicted line items and tables without the need for vision encoders. Finally, we claim that, while LMMs might sometimes offer marginal performance benefits, LLMs + RASG is oftentimes superior given real-world applications and constraints of BDIE.

information extraction, retrieval augmented structured generation, structured generation, (11 more...)

2405.20245

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Santa Clara County > San Jose (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

#artificialintelligenceMar-16-2022, 23:40:22 GMT

TripActions Launches AI and ML Tool, Auto-Itemization, to Save Time on Expense Reports

In development for more than a year, Auto-Itemization from TripActions Liquid is a solution that automatically splits transactions and attributes each line item to specific expense policies. The technology allows a user to upload a receipt for automatic itemisation; using AI, foreign language translation and fuzzy matching, each line item is categorised and assigned to a specific policy with 90 per cent accuracy. The use of machine learning also ensures that accuracy will increase over time. Single transactions often consist of multiple parts or line items -- like variable daily service charges for a hotel stay -- that need to be reported independently for accounting and tax purposes. The cost of the room needs to align with a company's hotel policy, while the room service that employee orders can align to a per diem for meal costs.

auto-itemization, launch ai and ml tool, tripaction liquid, (13 more...)

Country: South America (0.06)

Technology: Information Technology > Artificial Intelligence (1.00)

#artificialintelligenceDec-23-2021, 18:10:08 GMT

AI and Automation: Weapons in the Battle Against AP Fraud

Fraud costs enterprises about 5% of annual revenue, the Association of Certified Fraud Examiners (ACFE) noted in a recent report. Fraudsters look at the Accounts Payable (AP) department and see dollar signs because there are typically poor controls in place to prevent attacks. Because this business function processes payments, it's wide open for targeting by bad actors. AP fraud breaks out into a few typical patterns, which we'll examine below. The good news is that there are steps organizations can take to reduce the risk of these happening.

automation, fraud, payment, (11 more...)

Country:

North America > United States > Oregon (0.05)
North America > United States > Ohio (0.05)

Industry: Law Enforcement & Public Safety > Fraud (0.72)

Technology: Information Technology > Artificial Intelligence (0.51)

#artificialintelligenceAug-16-2021, 13:50:51 GMT

Council Post: How AI Can Help To Prevent Accounts Payable Fraud

Chief Product Officer at Kanverse.ai. According to the "2020 Report to the Nations," published by the Association of Certified Fraud Examiners (ACFE), organizations lose an estimated 5% of annual revenue due to fraud. For organized scammers, accounts payable (AP) departments are often perceived as a poorly guarded cashbox they can target for scams or other malicious attacks. In most cases, payments are processed through these departments, which makes this function vulnerable and a prime target for theft. There are several common types of AP fraud, but fortunately, there are also steps organizations can take to reduce their risk.

fraud, invoice, payment, (11 more...)

Country:

North America > United States > Oregon (0.05)
North America > United States > Ohio (0.05)

Industry:

Law Enforcement & Public Safety > Fraud (0.72)
Information Technology > Security & Privacy (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

#artificialintelligenceFeb-11-2020, 20:02:31 GMT

Dump the Spreadsheet. Leveraging AI for Automated Transaction Matching

As digital transformation sweeps across the enterprise landscape, F&A processes continue to evolve. The decades-old manual process of entering data into a spreadsheet for reconciliation purposes has given way to digital reconciliation, with the advent of automation technology to make it faster and more efficient. However, even automated processes have evolved in the last few years with the advances made in machine learning and AI. What do these advances mean for F&A teams today?To illustrate the profound implications of AI and machine learning for F&A, consider the evolution of the transaction matching process in reconciliation. From the earliest days, F&A departments have largely relied on manual processes to reconcile accounts.

automated transaction matching, automation, transaction, (13 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.61)

Maurya, Chandresh Kumar, Gantayat, Neelamadhav, Dechu, Sampath, Horvath, Tomas

Online Similarity Learning with Feedback for Invoice Line Item Matching

arXiv.org Machine LearningJan-1-2020

The procure to pay process (P2P) in large enterprises is a back-end business process which deals with the procurement of products and services for enterprise operations. Procurement is done by issuing purchase orders to impaneled vendors and invoices submitted by vendors are paid after they go through a rigorous validation process. Agents orchestrating P2P process often encounter the problem of matching a product or service descriptions in the invoice to those in purchase order and verify if the ordered items are what have been supplied or serviced. For example, the description in the invoice and purchase order could be TRES 739mL CD KER Smooth and TRES 0.739L CD KER Smth which look different at word level but refer to the same item. In a typical P2P process, agents are asked to manually select the products which are similar before invoices are posted for payment. This step in the business process is manual, repetitive, cumbersome, and costly. Since descriptions are not well-formed sentences, we cannot apply existing semantic and syntactic text similarity approaches directly. In this paper, we present two approaches to solve the above problem using various types of available agent's recorded feedback data. If the agent's feedback is in the form of a relative ranking between descriptions, we use similarity ranking algorithm. If the agent's feedback is absolute such as match or no-match, we use classification similarity algorithm. We also present the threats to the validity of our approach and present a possible remedy making use of product taxonomy and catalog. We showcase the comparative effectiveness and efficiency of the proposed approaches over many benchmarks and real-world data sets.

line item, precision, similarity, (17 more...)

arXiv.org Machine Learning

2001.00288

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Hungary > Budapest > Budapest (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Bull, Peter, Slavitt, Isaac, Lipstein, Greg

Harnessing the Power of the Crowd to Increase Capacity for Data Science in the Social Sector

arXiv.org Machine LearningJun-24-2016

We present three case studies of organizations using a data science competition to answer a pressing question. The first is in education where a nonprofit that creates smart school budgets wanted to automatically tag budget line items. The second is in public health, where a low-cost, nonprofit women's health care provider wanted to understand the effect of demographic and behavioral questions on predicting which services a woman would need. The third and final example is in government innovation: using online restaurant reviews from Yelp, competitors built models to forecast which restaurants were most likely to have hygiene violations when visited by health inspectors. Finally, we reflect on the unique benefits of the open, public competition model.

data mining, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1606.07781

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Genre:

Research Report (0.64)
Contests & Prizes (0.55)

Industry:

Health & Medicine > Public Health (0.91)
Health & Medicine > Consumer Health (0.70)
Education > Curriculum (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Data Science > Data Mining (0.95)
Information Technology > Artificial Intelligence > Natural Language (0.69)

Geyik, Sahin Cem, Saxena, Abhishek, Dasdan, Ali

Multi-Touch Attribution Based Budget Allocation in Online Advertising

arXiv.org Artificial IntelligenceFeb-23-2015

Budget allocation in online advertising deals with distributing the campaign (insertion order) level budgets to different sub-campaigns which employ different targeting criteria and may perform differently in terms of return-on-investment (ROI). In this paper, we present the efforts at Turn on how to best allocate campaign budget so that the advertiser or campaign-level ROI is maximized. To do this, it is crucial to be able to correctly determine the performance of sub-campaigns. This determination is highly related to the action-attribution problem, i.e. to be able to find out the set of ads, and hence the sub-campaigns that provided them to a user, that an action should be attributed to. For this purpose, we employ both last-touch (last ad gets all credit) and multi-touch (many ads share the credit) attribution methodologies. We present the algorithms deployed at Turn for the attribution problem, as well as their parallel implementation on the large advertiser performance datasets. We conclude the paper with our empirical comparison of last-touch and multi-touch attribution-based budget allocation in a real online advertising setting.

artificial intelligence, data mining, machine learning, (15 more...)

1502.06657

Country:

North America > United States > California > San Mateo County > Redwood City (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(3 more...)

Genre:

Research Report (0.50)
Workflow (0.47)

Industry:

Marketing (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning (0.93)