BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Wu, Jiageng, Gu, Bowen, Zhou, Ren, Xie, Kevin, Snyder, Doug, Jiang, Yixing, Carducci, Valentina, Wyss, Richard, Desai, Rishi J, Alsentzer, Emily, Celi, Leo Anthony, Rodman, Adam, Schneeweiss, Sebastian, Chen, Jonathan H., Romero-Brufau, Santiago, Lin, Kueiyu Joshua, Yang, Jie
–arXiv.org Artificial Intelligence
Large language models (LLMs) hold great promise for medical applications and are evolving rapidly, with new models being released at an accelerated pace. However, benchmarking on large-scale real-world data such as electronic health records (EHRs) is critical, as clinical decisions are directly informed by these sources, yet current evaluations remain limited. Most existing benchmarks rely on medical exam-style questions or PubMed-derived text, failing to capture the complexity of real-world clinical data. Others focus narrowly on specific application scenarios, limiting their generalizability across broader clinical use. To address this gap, we present BRIDGE, a comprehensive multilingual benchmark comprising 87 tasks sourced from real-world clinical data sources across nine languages. It covers eight major task types spanning the entire continuum of patient care across six clinical stages and 20 representative applications, including triage and referral, consultation, information extraction, diagnosis, prognosis, and billing coding, and involves 14 clinical specialties. We systematically evaluated 95 LLMs (including DeepSeek-R1, GPT-4o, Gemini series, and Qwen3 series) under various inference strategies. Our results reveal substantial performance variation across model sizes, languages, natural language processing tasks, and clinical specialties. Notably, we demonstrate that open-source LLMs can achieve performance comparable to proprietary models, while medically fine-tuned LLMs based on older architectures often underperform versus updated general-purpose models. The BRIDGE and its corresponding leaderboard serve as a foundational resource and a unique reference for the development and evaluation of new LLMs in real-world clinical text understanding. The BRIDGE leaderboard: https://huggingface.co/spaces/YLab-Open/BRIDGE-Medical-Leaderboard
arXiv.org Artificial Intelligence
Oct-29-2025
- Country:
- Africa
- Togo (0.04)
- Mali (0.04)
- Burkina Faso (0.04)
- Niger (0.04)
- South Africa (0.04)
- Central African Republic (0.04)
- Rwanda (0.04)
- Mozambique (0.04)
- Gabon (0.04)
- Equatorial Guinea (0.04)
- Angola (0.04)
- Senegal (0.04)
- Burundi (0.04)
- Guinea-Bissau (0.04)
- Democratic Republic of the Congo (0.04)
- Benin (0.04)
- Middle East > Djibouti (0.04)
- Côte d'Ivoire (0.04)
- Asia
- China
- Hong Kong (0.04)
- Shanghai > Shanghai (0.04)
- Zhejiang Province (0.04)
- Japan (0.04)
- Macao (0.04)
- Middle East
- Israel (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Russia (0.04)
- Singapore (0.04)
- Taiwan (0.04)
- Timor-Leste (0.04)
- China
- Europe
- Belarus (0.04)
- Portugal > Coimbra
- Coimbra (0.04)
- United Kingdom (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Germany > Baden-Württemberg
- Tübingen Region > Tübingen (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Russia (0.04)
- France
- Norway (0.04)
- Monaco (0.04)
- Italy > Tuscany
- Florence (0.04)
- Liechtenstein (0.04)
- Austria (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- Cuba (0.04)
- Haiti (0.04)
- El Salvador (0.04)
- United States
- California > Santa Clara County
- Illinois > Champaign County
- Urbana (0.13)
- Massachusetts
- Middlesex County > Cambridge (0.04)
- Suffolk County > Boston (0.04)
- Minnesota > Olmsted County
- Rochester (0.04)
- New York > New York County
- New York City (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- Jamaica (0.04)
- Mexico (0.04)
- Guatemala (0.04)
- Nicaragua (0.04)
- Honduras (0.04)
- Dominican Republic (0.04)
- Costa Rica (0.04)
- Panama (0.04)
- Trinidad and Tobago (0.04)
- Canada > Ontario
- Oceania
- Australia > New South Wales
- Sydney (0.04)
- New Zealand (0.04)
- Australia > New South Wales
- South America
- Uruguay (0.04)
- Argentina (0.04)
- Paraguay (0.04)
- Brazil (0.04)
- Bolivia (0.04)
- Colombia > Meta Department
- Villavicencio (0.04)
- Ecuador (0.04)
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Venezuela (0.04)
- Peru (0.04)
- Africa
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Strength High (1.00)
- Research Report
- Industry:
- Health & Medicine
- Consumer Health (1.00)
- Diagnostic Medicine > Imaging (1.00)
- Government Relations & Public Policy (0.92)
- Health Care Providers & Services (1.00)
- Health Care Technology > Medical Record (1.00)
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Cardiology/Vascular Diseases (1.00)
- Endocrinology > Diabetes (0.92)
- Gastroenterology (1.00)
- Infections and Infectious Diseases (1.00)
- Neurology (1.00)
- Oncology (1.00)
- Pulmonary/Respiratory Diseases (1.00)
- Health & Medicine
- Technology: