LLM Agents for Generating Microservice-based Applications: how complex is your specification?

Oct-28-2025–arXiv.org Artificial Intelligence

In this paper we evaluate the capabilities of LLM Agents in generating code for real-world problems. Specifically, we explore code synthesis for microservice-based applications, a widely used architectural pattern for building applications. We define a standard template for specifying these applications, and we propose a metric for scoring the difficulty of a specification. The higher the score, the more difficult it is to generate code for the specification. Our experimental results show that agents using strong LLMs (like GPT-3o-mini) do fairly well on medium difficulty specifications but do poorly on those of higher difficulty levels. This is due to more intricate business logic, a greater use of external services, database integration and inclusion of non-functional capabilities such as authentication. We analyzed the errors in LLM-synthesized code and report on the key challenges LLM Agents face in generating code for these specifications. Finally, we show that using a fine-grained approach to code generation improves the correctness of the generated code.

experiment, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-28-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.46)
- Europe > Austria (0.28)
- North America > United States (0.28)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Consumer Products & Services > Restaurants (0.47)
- Information Technology (0.35)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found