Goto

Collaborating Authors

 smirk


Smirk: An Atomically Complete Tokenizer for Molecular Foundation Models

Wadell, Alexius, Bhutani, Anoushka, Viswanathan, Venkatasubramanian

arXiv.org Artificial Intelligence

Molecular Foundation Models are emerging as powerful tools for accelerating molecular design, material science, and cheminformatics, leveraging transformer architectures to speed up the discovery of new materials and drugs while reducing the computational cost of traditional ab initio methods. However, current models are constrained by closed-vocabulary tokenizers that fail to capture the full diversity of molecular structures. In this work, we systematically evaluate thirteen chemistry-specific tokenizers for their coverage of the SMILES language, uncovering substantial gaps. Using N-gram language models, we accessed the impact of tokenizer choice on model performance and quantified the information loss of unknown tokens. We introduce two new tokenizers, smirk and smirk-gpe, which can represent the entirety of the OpenSMILES specification while avoiding the pitfalls of existing tokenizers. Our work highlights the importance of open-vocabulary modeling for molecular foundation models and the need for chemically diverse benchmarks for cheminformatics.


Ergo, SMIRK is Safe: A Safety Case for a Machine Learning Component in a Pedestrian Automatic Emergency Brake System

Borg, Markus, Henriksson, Jens, Socha, Kasper, Lennartsson, Olof, Lönegren, Elias Sonnsjö, Bui, Thanh, Tomaszewski, Piotr, Sathyamoorthy, Sankar Raman, Brink, Sebastian, Moghadam, Mahshid Helali

arXiv.org Artificial Intelligence

Machine Learning (ML) is increasingly used in critical applications, e.g., supervised learning using Deep Neural Networks (DNN) to support automotive perception. Software systems developed for safety-critical applications must undergo assessments to demonstrate compliance with functional safety standards. However, as the conventional safety standards are not fully applicable for ML-enabled systems (Salay et al, 2018; Tambon et al, 2022), several domain-specific initiatives aim to complement them, e.g., organized by the EU Aviation Safety Agency, the ITU-WHO Focus Group on AI for Health, and the International Organization for Standardization. In the automotive industry, several standardization initiatives are ongoing to allow safe use of ML in road vehicles. It is evident that the established functional safety as defined in ISO 26262 Functional Safety (FuSa) is no longer sufficient for the next generation of Advanced Driver-Assistance Systems (ADAS) and Autonomous Driving (AD). One complementary standard under development is ISO 21448 Safety of the Intended Functionality (SOTIF). SOTIF aims for absence of unreasonable risk due to hazards resulting from functional insufficiencies, incl.


Exploring the Assessment List for Trustworthy AI in the Context of Advanced Driver-Assistance Systems

Borg, Markus, Bronson, Joshua, Christensson, Linus, Olsson, Fredrik, Lennartsson, Olof, Sonnsjö, Elias, Ebabi, Hamid, Karsberg, Martin

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) is increasingly used in critical applications. Thus, the need for dependable AI systems is rapidly growing. In 2018, the European Commission appointed experts to a High-Level Expert Group on AI (AI-HLEG). AI-HLEG defined Trustworthy AI as 1) lawful, 2) ethical, and 3) robust and specified seven corresponding key requirements. To help development organizations, AI-HLEG recently published the Assessment List for Trustworthy AI (ALTAI). We present an illustrative case study from applying ALTAI to an ongoing development project of an Advanced Driver-Assistance System (ADAS) that relies on Machine Learning (ML). Our experience shows that ALTAI is largely applicable to ADAS development, but specific parts related to human agency and transparency can be disregarded. Moreover, bigger questions related to societal and environmental impact cannot be tackled by an ADAS supplier in isolation. We present how we plan to develop the ADAS to ensure ALTAI-compliance. Finally, we provide three recommendations for the next revision of ALTAI, i.e., life-cycle variants, domain-specific adaptations, and removed redundancy.