DependEval: Benchmarking LLMs for Repository Dependency Understanding

Du, Junjia, Liu, Yadi, Guo, Hongcheng, Wang, Jiawei, Huang, Haojian, Ni, Yunyi, Li, Zhoujun

Mar-9-2025–arXiv.org Artificial Intelligence

While large language models (LLMs) have shown considerable promise in code generation, real-world software development demands advanced repository-level reasoning. This includes understanding dependencies, project structures, and managing multi-file changes. However, the ability of LLMs to effectively comprehend and handle complex code repositories has yet to be fully explored. To address challenges, we introduce a hierarchical benchmark designed to evaluate repository dependency understanding (DependEval). Benchmark is based on 15,576 repositories collected from real-world websites. It evaluates models on three core tasks: Dependency Recognition, Repository Construction, and Multi-file Editing, across 8 programming languages from actual code repositories. Our evaluation of over 25 LLMs reveals substantial performance gaps and provides valuable insights into repository-level code understanding.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Mar-9-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)
- Europe > Austria
  - Vienna (0.14)

Genre:
- Research Report (0.50)

Industry:
- Information Technology (0.67)
- Leisure & Entertainment (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)