Provably safe certification for machine learning models under adversarial attacks: Interview with Chen Feng

AIHub 

In their work PROSAC: Provably Safe Certification for Machine Learning Models under Adversarial Attacks presented at AAAI 2025, Chen Feng, Ziquan Liu, Zhuo Zhi, Ilija Bogunovic, Carsten Gerner-Beuerle, and Miguel Rodrigues developed a new way to certify the performance of machine learning models in the presence of adversarial attacks with population-level risk guarantees. Here, Chen tells us more about their methodology, the main findings, and some of the implications of this work. This paper focuses on making machine learning models safer against adversarial attacks--those sneaky tweaks to data, like altering an image just enough to trick an AI into misclassifying it. We developed a new approach called PROSAC, which stands for PROvably SAfe Certification. It's a way to test and certify that a model can hold up under any kind of attack, not just a few specific ones.