Goto

Collaborating Authors

 Soni, Jitendra


Autonomous Microscopy Experiments through Large Language Model Agents

arXiv.org Artificial Intelligence

The emergence of large language models (LLMs) has accelerated the development of self - driving laboratories (SDLs) for materials research. Despite their transformative potential, current SDL implementations rely on rigid, predefined protocols that limit the ir adaptability to dynamic experimental scenarios across different labs. A significant challenge persists in measuring how effectively AI agents can replicate the adaptive decision - making and experimental intuition of expert scientists. Here, we introduce AILA (Artificially Intelligent Lab Assistant), a framework that automates atomic force microscopy (AFM) through LLM - driven agents. Using AFM as an experimental testbed, we develop AFMBench -- a comprehensive evaluation suite that challenges AI agents based on language models like GPT - 4o and GPT - 3.5 to perform tasks spanning the sc ientific workflow: from experimental design to results analysis. Our systematic assessment shows that state - of - the - art language models struggle even with basic tasks such as documentation retrieval, leading to a significant decline in performance in multi - agent coordination scenarios . Further, we observe that LLMs exhibit a tendency to not adhere to instructions or even divagate to additional tasks beyond the original request, raising serious concerns regarding safety alignment aspects of AI agents for SDLs . Finally, w e demonstrate the application of AILA on increasingly complex experiments open - ended experiments: automated AFM calibration, high - resolution feature detection, and mechanical property measurement . Our findings emphasize the necessity for stringent benchmarking protocols before deploying AI agents as laboratory assistants across scientific disciplines.