PHM-Bench: A Domain-Specific Benchmarking Framework for Systematic Evaluation of Large Models in Prognostics and Health Management

Yang, Puyu, Tao, Laifa, Huang, Zijian, Liu, Haifei, Cao, Wenyan, Ji, Hao, Qiu, Jianan, Huang, Qixuan, Su, Xuanyuan, Xie, Yuhang, Zhang, Jun, Li, Shangyu, Lu, Chen, Lian, Zhixuan

Aug-5-2025–arXiv.org Artificial Intelligence

With the rapid advancement of generative artificial intelligence, large language models (LLMs) are increasingly adopted in industrial domains, offering new opportunities for Prognostics and Health Management (PHM), addressing challenges such as high development costs, long deployment cycles, and limited generalizability. However, despite the grow ing synergy between PHM and LLM s, existing evaluation methodologies often fall short regarding structural completeness, dimensional comprehensiveness, and evaluatio n granularity, severely hampering the in - depth integration of LLMs into the PHM domain. To address these limitations, this study, drawing upon two decades of PHM research and recent advancements in LLM - driven PHM systems, proposes PHM - Bench, a novel three - dimensional evaluation framework for PHM - oriented large models. Grounded in the triadic structure of fundamental capabilit y, core task, and entire lifecycle, PHM - Bench is designed specifically for the unique demands of PHM system engineering. It systematically defines multi - level evaluation metrics spanning knowledge comprehension, algorithmi c generation, task optimization, etc., aligning with typical PHM tasks including condition monitoring, fault diagnosis, fault & RUL prediction, and maintenance decision - making, thus establishing a comprehensive assessment mechanism, bridg ing complex engineering systems' design, development, and operational stages. Utilizing both self - constructed case sets and publicly available industrial dataset s, PHM - Bench enables multi - dimensional evaluation of general - purpose and domain - specific models across diverse PHM tasks. Experiments demonstrate its effectiveness in revealing model capabilities and limitations, distinguishing performance across tasks, and providing a unified baseline for model development and optimization. PHM - Bench lays the methodological foundation for industrial - scale assessment of LLMs in PHM and offers a critical benchmark to guide the transition from general - purpose to PHM - specialized models.

dimension, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Aug-5-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.46)
- North America > United States (0.28)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine > Consumer Health (1.00)
- Energy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found