AITopics | bpm task

WONDERBREAD: A Benchmark for Evaluating Multimodal Foundation Models on Business Process Management Tasks

Neural Information Processing SystemsMar-22-2026, 15:01:46 GMT

Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task -- full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This focus on automation ignores the reality of how most BPM tools are applied today -- simply documenting the relevant workflow takes 60% of the time of the typical process optimization project. To address this gap we present WONDERBREAD, the first benchmark for evaluating multimodal FMs on BPM tasks beyond automation. Our contributions are: (1) a dataset containing 2928 documented workflow demonstrations; (2) 6 novel BPM tasks sourced from real-world applications ranging from workflow documentation to knowledge transfer to process improvement; and (3) an automated evaluation harness. Our benchmark shows that while state-of-the-art FMs can automatically generate documentation (e.g.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

WONDERBREAD: A Benchmark for Evaluating Multimodal Foundation Models on Business Process Management Tasks

Neural Information Processing SystemsMay-27-2025, 17:48:53 GMT

Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task -- full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This focus on automation ignores the reality of how most BPM tools are applied today -- simply documenting the relevant workflow takes 60% of the time of the typical process optimization project. To address this gap we present WONDERBREAD, the first benchmark for evaluating multimodal FMs on BPM tasks beyond automation.

business process management task, multimodal foundation model, wonderbread, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Architecture > Enterprise Architecture (0.63)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Towards a Benchmark for Large Language Models for Business Process Management Tasks

Busch, Kiran, Leopold, Henrik

arXiv.org Artificial IntelligenceOct-13-2024

An increasing number of organizations are deploying Large Language Models (LLMs) for a wide range of tasks. Despite their general utility, LLMs are prone to errors, ranging from inaccuracies to hallucinations. To objectively assess the capabilities of existing LLMs, performance benchmarks are conducted. However, these benchmarks often do not translate to more specific real-world tasks. This paper addresses the gap in benchmarking LLM performance in the Business Process Management (BPM) domain. Currently, no BPM-specific benchmarks exist, creating uncertainty about the suitability of different LLMs for BPM tasks. This paper systematically compares LLM performance on four BPM tasks focusing on small open-source models. The analysis aims to identify task-specific performance variations, compare the effectiveness of open-source versus commercial models, and assess the impact of model size on BPM task performance. This paper provides insights into the practical applications of LLMs in BPM, guiding organizations in selecting appropriate models for their specific needs.

bpm task, constraint, llm, (16 more...)

arXiv.org Artificial Intelligence

2410.03255

Country:

North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
Asia > Middle East > UAE (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre:

Workflow (0.93)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management Tasks

Wornow, Michael, Narayan, Avanika, Viggiano, Ben, Khare, Ishan S., Verma, Tathagat, Thompson, Tibor, Hernandez, Miguel Angel Fuentes, Sundar, Sudharsan, Trujillo, Chloe, Chawla, Krrish, Lu, Rongfei, Shen, Justin, Nagaraj, Divya, Martinez, Joshua, Agrawal, Vardhan, Hudson, Althea, Shah, Nigam H., Re, Christopher

arXiv.org Artificial IntelligenceJun-19-2024

Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task - full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This focus on automation ignores the reality of how most BPM tools are applied today - simply documenting the relevant workflow takes 60% of the time of the typical process optimization project. To address this gap we present WONDERBREAD, the first benchmark for evaluating multimodal FMs on BPM tasks beyond automation. Our contributions are: (1) a dataset containing 2928 documented workflow demonstrations; (2) 6 novel BPM tasks sourced from real-world applications ranging from workflow documentation to knowledge transfer to process improvement; and (3) an automated evaluation harness. Our benchmark shows that while state-of-the-art FMs can automatically generate documentation (e.g. recalling 88% of the steps taken in a video demonstration of a workflow), they struggle to re-apply that knowledge towards finer-grained validation of workflow completion (F1 < 0.3). We hope WONDERBREAD encourages the development of more "human-centered" AI tooling for enterprise applications and furthers the exploration of multimodal FMs for the broader universe of BPM tasks. We publish our dataset and experiments here: https://github.com/HazyResearch/wonderbread

demonstration, sop, workflow, (15 more...)

arXiv.org Artificial Intelligence

2406.13264

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany (0.04)
(3 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (0.46)

Industry:

Information Technology > Software (0.46)
Health & Medicine > Health Care Providers & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Large Language Models can accomplish Business Process Management Tasks

Grohs, Michael, Abb, Luka, Elsayed, Nourhan, Rehse, Jana-Rebecca

arXiv.org Artificial IntelligenceJul-19-2023

Business Process Management (BPM) aims to improve organizational activities and their outcomes by managing the underlying processes. To achieve this, it is often necessary to consider information from various sources, including unstructured textual documents. Therefore, researchers have developed several BPM-specific solutions that extract information from textual documents using Natural Language Processing techniques. These solutions are specific to their respective tasks and cannot accomplish multiple process-related problems as a general-purpose instrument. However, in light of the recent emergence of Large Language Models (LLMs) with remarkable reasoning capabilities, such a general-purpose instrument with multiple applications now appears attainable. In this paper, we illustrate how LLMs can accomplish text-related BPM tasks by applying a specific LLM to three exemplary tasks: mining imperative process models from textual descriptions, mining declarative process models from textual descriptions, and assessing the suitability of process tasks from textual descriptions for robotic process automation. We show that, without extensive configuration or prompt engineering, LLMs perform comparably to or better than existing solutions and discuss implications for future BPM research as well as practical usage.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2307.09923

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Just Tell Me: Prompt Engineering in Business Process Management

Busch, Kiran, Rochlitzer, Alexander, Sola, Diana, Leopold, Henrik

arXiv.org Artificial IntelligenceApr-14-2023

GPT-3 and several other language models (LMs) can effectively address various natural language processing (NLP) tasks, including machine translation and text summarization. Recently, they have also been successfully employed in the business process management (BPM) domain, e.g., for predictive process monitoring and process extraction from text. This, however, typically requires fine-tuning the employed LM, which, among others, necessitates large amounts of suitable training data. A possible solution to this problem is the use of prompt engineering, which leverages pre-trained LMs without fine-tuning them. Recognizing this, we argue that prompt engineering can help bring the capabilities of LMs to BPM research. We use this position paper to develop a research agenda for the use of prompt engineering for BPM research by identifying the associated potentials and challenges.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2304.07183

Country: Europe > Germany > Hamburg (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Filters

Collaborating Authors

bpm task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

WONDERBREAD: A Benchmark for Evaluating Multimodal Foundation Models on Business Process Management Tasks

WONDERBREAD: A Benchmark for Evaluating Multimodal Foundation Models on Business Process Management Tasks

Towards a Benchmark for Large Language Models for Business Process Management Tasks

Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management Tasks

Large Language Models can accomplish Business Process Management Tasks

Just Tell Me: Prompt Engineering in Business Process Management