AITopics | specialty

Collaborating Authors

specialty

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts

Neural Information Processing SystemsDec-24-2025, 18:12:10 GMT

In this paper, we tackle the problem of domain shift. Most existing methods perform training on multiple source domains using a single model, and the same trained model is used on all unseen target domains. Such solutions are sub-optimal as each target domain exhibits its own specialty, which is not adapted. Furthermore, expecting single-model training to learn extensive knowledge from multiple source domains is counterintuitive. The model is more biased toward learning only domain-invariant features and may result in negative knowledge transfer.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)

Add feedback

A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties

Wang, Jinghao, Zhang, Ping, Yagemann, Carter

arXiv.org Artificial IntelligenceDec-10-2025

Medical Large Language Models (LLMs) are increasingly deployed for clinical decision support across diverse specialties, yet systematic evaluation of their robustness to adversarial misuse and privacy leakage remains inaccessible to most researchers. Existing security benchmarks require GPU clusters, commercial API access, or protected health data -- barriers that limit community participation in this critical research area. We propose a practical, fully reproducible framework for evaluating medical AI security under realistic resource constraints. Our framework design covers multiple medical specialties stratified by clinical risk -- from high-risk domains such as emergency medicine and psychiatry to general practice -- addressing jailbreaking attacks (role-playing, authority impersonation, multi-turn manipulation) and privacy extraction attacks. All evaluation utilizes synthetic patient records requiring no IRB approval. The framework is designed to run entirely on consumer CPU hardware using freely available models, eliminating cost barriers. We present the framework specification including threat models, data generation methodology, evaluation protocols, and scoring rubrics. This proposal establishes a foundation for comparative security assessment of medical-specialist models and defense mechanisms, advancing the broader goal of ensuring safe and trustworthy medical AI systems.

large language model, machine learning, neural information processing system, (15 more...)

arXiv.org Artificial Intelligence

2512.08185

Country: North America > United States > Ohio (0.04)

Genre: Research Report > Experimental Study (0.35)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.90)
Health & Medicine > Therapeutic Area (0.69)
Government > Regional Government > North America Government > United States Government (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

CoT-X: An Adaptive Framework for Cross-Model Chain-of-Thought Transfer and Optimization

Bi, Ziqian, Chen, Kaijie, Wang, Tianyang, Hao, Junfeng, Peng, Benji, Song, Xinyuan

arXiv.org Artificial IntelligenceDec-3-2025

Chain-of-Thought (CoT) reasoning enhances the problem-solving ability of large language models (LLMs) but leads to substantial inference overhead, limiting deployment in resource-constrained settings. This paper investigates efficient CoT transfer across models of different scales and architectures through an adaptive reasoning summarization framework. The proposed method compresses reasoning traces via semantic segmentation with importance scoring, budget-aware dynamic compression, and coherence reconstruction, preserving critical reasoning steps while significantly reducing token usage. Experiments on 7{,}501 medical examination questions across 10 specialties show up to 40% higher accuracy than truncation under the same token budgets. Evaluations on 64 model pairs from eight LLMs (1.5B-32B parameters, including DeepSeek-R1 and Qwen3) confirm strong cross-model transferability. Furthermore, a Gaussian Process-based Bayesian optimization module reduces evaluation cost by 84% and reveals a power-law relationship between model size and cross-domain robustness. These results demonstrate that reasoning summarization provides a practical path toward efficient CoT transfer, enabling advanced reasoning under tight computational constraints. Code will be released upon publication.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.05747

Country:

North America > United States > Ohio (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.68)
Health & Medicine > Diagnostic Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

One Patient, Many Contexts: Scaling Medical AI with Contextual Intelligence

Li, Michelle M., Reis, Ben Y., Rodman, Adam, Cai, Tianxi, Dagan, Noa, Balicer, Ran D., Loscalzo, Joseph, Kohane, Isaac S., Zitnik, Marinka

arXiv.org Artificial IntelligenceDec-1-2025

Medical AI, including clinical language models, vision-language models, and multimodal health record models, already summarizes notes, answers questions, and supports decisions. Their adaptation to new populations, specialties, or care settings often relies on fine-tuning, prompting, or retrieval from external knowledge bases. These strategies can scale poorly and risk contextual errors: outputs that appear plausible but miss critical patient or situational information. We envision context switching as a solution. Context switching adjusts model reasoning at inference without retraining. Generative models can tailor outputs to patient biology, care setting, or disease. Multimodal models can reason on notes, laboratory results, imaging, and genomics, even when some data are missing or delayed. Agent models can coordinate tools and roles based on tasks and users. In each case, context switching enables medical AI to adapt across specialties, populations, and geographies. It requires advances in data design, model architectures, and evaluation frameworks, and establishes a foundation for medical AI that scales to infinitely many contexts while remaining reliable and suited to real-world care.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.10157

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.05)
Asia > Middle East > Israel (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Add feedback

MedBench v4: A Robust and Scalable Benchmark for Evaluating Chinese Medical Language Models, Multimodal Models, and Intelligent Agents

Ding, Jinru, Lu, Lu, Ding, Chao, Bian, Mouxiao, Chen, Jiayuan, Pang, Wenrao, Chen, Ruiyao, Peng, Xinwei, Lu, Renjie, Ren, Sijie, Zhu, Guanxu, Wu, Xiaoqin, Liu, Zhiqiang, Zhang, Rongzhao, Jiang, Luyi, Han, Bing, Wang, Yunqiu, Xu, Jie

arXiv.org Artificial IntelligenceNov-20-2025

Recent advances in medical large language models (LLMs), multimodal models, and agents demand evaluation frameworks that reflect real clinical workflows and safety constraints. We present MedBench v4, a nationwide, cloud-based benchmarking infrastructure comprising over 700,000 expert-curated tasks spanning 24 primary and 91 secondary specialties, with dedicated tracks for LLMs, multimodal models, and agents. Items undergo multi-stage refinement and multi-round review by clinicians from more than 500 institutions, and open-ended responses are scored by an LLM-as-a-judge calibrated to human ratings. We evaluate 15 frontier models. Base LLMs reach a mean overall score of 54.1/100 (best: Claude Sonnet 4.5, 62.5/100), but safety and ethics remain low (18.4/100). Multimodal models perform worse overall (mean 47.5/100; best: GPT-5, 54.9/100), with solid perception yet weaker cross-modal reasoning. Agents built on the same backbones substantially improve end-to-end performance (mean 79.8/100), with Claude Sonnet 4.5-based agents achieving up to 85.3/100 overall and 88.9/100 on safety tasks. MedBench v4 thus reveals persisting gaps in multimodal reasoning and safety for base models, while showing that governance-aware agentic orchestration can markedly enhance benchmarked clinical readiness without sacrificing capability. By aligning tasks with Chinese clinical guidelines and regulatory priorities, the platform offers a practical reference for hospitals, developers, and policymakers auditing medical AI.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.14439

Country:

Asia > China > Shanghai > Shanghai (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Asking the Right Questions: Benchmarking Large Language Models in the Development of Clinical Consultation Templates

McCoy, Liam G., Haredasht, Fateme Nateghi, Chopra, Kanav, Wu, David, Wu, David JH, Conteh, Abass, Khemani, Sarita, Maharaj, Saloni Kumar, Ravi, Vishnu, Pahwa, Arth, Weng, Yingjie, Rosengaus, Leah, Giang, Lena, Li, Kelvin Zhenghao, Jee, Olivia, Shirvani, Daniel, Goh, Ethan, Chen, Jonathan H.

arXiv.org Artificial IntelligenceNov-13-2025

This study evaluates the capacity of large language models (LLMs) to generate structured clinical consultation templates for electronic consultation. Using 145 expert-crafted templates developed and routinely used by Stanford's eConsult team, we assess frontier models -- including o3, GPT-4o, Kimi K2, Claude 4 Sonnet, Llama 3 70B, and Gemini 2.5 Pro -- for their ability to produce clinically coherent, concise, and prioritized clinical question schemas. Through a multi-agent pipeline combining prompt optimization, semantic autograding, and prioritization analysis, we show that while models like o3 achieve high comprehensiveness (up to 92.2\%), they consistently generate excessively long templates and fail to correctly prioritize the most clinically important questions under length constraints. Performance varies across specialties, with significant degradation in narrative-driven fields such as psychiatry and pain medicine. Our findings demonstrate that LLMs can enhance structured clinical information exchange between physicians, while highlighting the need for more robust evaluation methods that capture a model's ability to prioritize clinically salient information within the time constraints of real-world physician communication.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.01159

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.16)
North America > United States > California > Santa Clara County > Stanford (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Gender Bias in Large Language Models for Healthcare: Assignment Consistency and Clinical Implications

Liu, Mingxuan, Ke, Yuhe, Zhu, Wentao, Mertens, Mayli, Ning, Yilin, Liao, Jingchi, Hong, Chuan, Ting, Daniel Shu Wei, Peng, Yifan, Bitterman, Danielle S., Ong, Marcus Eng Hock, Liu, Nan

arXiv.org Artificial IntelligenceOct-13-2025

The integration of large language models (LLMs) into healthcare holds promise to enhance clinical decision-making, yet their susceptibility to biases remains a critical concern. Gender has long influenced physician behaviors and patient outcomes, raising concerns that LLMs assuming human-like roles, such as clinicians or medical educators, may replicate or amplify gender-related biases. Using case studies from the New England Journal of Medicine Challenge (NEJM), we assigned genders (female, male, or unspecified) to multiple open-source and proprietary LLMs. We evaluated their response consistency across LLM-gender assignments regarding both LLM-based diagnosis and models' judgments on the clinical relevance or necessity of patient gender. In our findings, diagnoses were relatively consistent across LLM genders for most models. However, for patient gender's relevance and necessity in LLM-based diagnosis, all models demonstrated substantial inconsistency across LLM genders, particularly for relevance judgements. Some models even displayed a systematic female-male disparity in their interpretation of patient gender. These findings present an underexplored bias that could undermine the reliability of LLMs in clinical practice, underscoring the need for routine checks of identity-assignment consistency when interacting with LLMs to ensure reliable and equitable AI-supported clinical care.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.08614

Country:

Asia > Singapore > Central Region > Singapore (0.05)
Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.05)
North America > United States > North Carolina > Durham County > Durham (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mapping Patient-Perceived Physician Traits from Nationwide Online Reviews with LLMs

Luo, Junjie, Han, Rui, Welivita, Arshana, Di, Zeleikun, Wu, Jingfu, Zhi, Xuzhe, Agarwal, Ritu, Gao, Gordon

arXiv.org Artificial IntelligenceOct-7-2025

Interpersonal and professional qualities of physicians profoundly shape patient trust, communication, adherence, and health outcomes [1, 2]. Understanding these qualities from the patient's perspective is essential to advancing patient-centered care, yet current measurement tools--such as standardized surveys or aggregate star ratings--capture only a narrow view of the physician-patient relationship. In parallel, millions of online physician reviews now provide an abundant, patient-generated record of real-world experiences, offering an unprecedented opportunity to examine how physicians are perceived in everyday practice [3, 4, 5, 6]. Extracting clinically meaningful information from such narrative data remains challenging. Prior studies have typically relied on sentiment analysis or topic modeling, approaches that overlook the multidimensional nature of patient perceptions. Well-established frameworks from psychology, such as the Big Five personality traits [7], offer interpretable constructs for describing interpersonal style, but have rarely been operationalized at scale in healthcare settings [8]. Similarly, healthcare-specific qualities--communication effectiveness, perceived competence, attentiveness to outcomes, and trustworthiness--are widely recognized as central to care quality but are difficult to measure systematically. Manual coding of these traits is costly, inconsistent, and infeasible for national datasets. Recent advances in large language models (LLMs) enable a new approach [9].

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.03997

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > United States > Indiana (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Performance of Large Language Models in Answering Critical Care Medicine Questions

Alwakeel, Mahmoud, Nagori, Aditya, Wong, An-Kwok Ian, Chaisson, Neal, Krishnamoorthy, Vijay, Kamaleswaran, Rishikesan

arXiv.org Artificial IntelligenceSep-25-2025

Abstract: Large Language Models have been tested on medical student-level questions, but their performance in specialized fields like Critical Care Medicine (CCM) is less explored. This study evaluated Meta-Llama 3.1 models (8B and 70B parameters) on 871 CCM questions. Performance varied across domains, highest in Research (68.4%) and lowest in Renal (47.9%), highlighting the need for broader future work to improve models across various subspecialty domains. Introduction: The use of Large Language Models (LLMs) to answer medical exam - style questions has gained popularity in recent years. This study aims to evaluate the performance of LLMs in answering subspecialty CCM board exam - style questions.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2509.19344

Country:

North America > United States > Ohio > Cuyahoga County > Cleveland (0.05)
North America > United States > North Carolina > Durham County > Durham (0.05)

Genre: Research Report (0.95)

Industry: Health & Medicine > Therapeutic Area (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

"What's Up, Doc?": Analyzing How Users Seek Health Information in Large-Scale Conversational AI Datasets

Paruchuri, Akshay, Aziz, Maryam, Vartak, Rohit, Ali, Ayman, Uchehara, Best, Liu, Xin, Chatterjee, Ishan, Agrawal, Monica

arXiv.org Artificial IntelligenceSep-23-2025

People are increasingly seeking healthcare information from large language models (LLMs) via interactive chatbots, yet the nature and inherent risks of these conversations remain largely unexplored. In this paper, we filter large-scale conversational AI datasets to achieve HealthChat-11K, a curated dataset of 11K real-world conversations composed of 25K user messages. We use HealthChat-11K and a clinician-driven taxonomy for how users interact with LLMs when seeking healthcare information in order to systematically study user interactions across 21 distinct health specialties. Our analysis reveals insights into the nature of how and why users seek health information, such as common interactions, instances of incomplete context, affective behaviors, and interactions (e.g., leading questions) that can induce sycophancy, underscoring the need for improvements in the healthcare support capabilities of LLMs deployed as conversational AI. Code and artifacts to retrieve our analyses and combine them into a curated dataset can be found here: https://github.com/yahskapar/HealthChat

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.21532

Country:

North America > United States (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Consumer Health (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback