practical framework
A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties
Wang, Jinghao, Zhang, Ping, Yagemann, Carter
Medical Large Language Models (LLMs) are increasingly deployed for clinical decision support across diverse specialties, yet systematic evaluation of their robustness to adversarial misuse and privacy leakage remains inaccessible to most researchers. Existing security benchmarks require GPU clusters, commercial API access, or protected health data -- barriers that limit community participation in this critical research area. We propose a practical, fully reproducible framework for evaluating medical AI security under realistic resource constraints. Our framework design covers multiple medical specialties stratified by clinical risk -- from high-risk domains such as emergency medicine and psychiatry to general practice -- addressing jailbreaking attacks (role-playing, authority impersonation, multi-turn manipulation) and privacy extraction attacks. All evaluation utilizes synthetic patient records requiring no IRB approval. The framework is designed to run entirely on consumer CPU hardware using freely available models, eliminating cost barriers. We present the framework specification including threat models, data generation methodology, evaluation protocols, and scoring rubrics. This proposal establishes a foundation for comparative security assessment of medical-specialist models and defense mechanisms, advancing the broader goal of ensuring safe and trustworthy medical AI systems.
From Classical to Hybrid: A Practical Framework for Quantum-Enhanced Learning
Illésová, Silvie, Bezděk, Tomáš, Novák, Vojtěch, Zelinka, Ivan, Cacciatore, Stefano, Beseda, Martin
This work addresses the challenge of enabling practitioners without quantum expertise to transition from classical to hybrid quantum-classical machine learning workflows. We propose a three-stage framework: starting with a classical self-training model, then introducing a minimal hybrid quantum variant, and finally applying diagnostic feedback via QMetric to refine the hybrid architecture. In experiments on the Iris dataset, the refined hybrid model improved accuracy from 0.31 in the classical approach to 0.87 in the quantum approach. These results suggest that even modest quantum components, when guided by proper diagnostics, can enhance class separation and representation capacity in hybrid learning, offering a practical pathway for classical machine learning practitioners to leverage quantum-enhanced methods.
Practical Lessons on Optimizing Sponsored Products in eCommerce
Xue, Yanbing, Liu, Bo, Du, Weizhi, Korlimarla, Jayanth, Men, Musen
In this paper, we study multiple problems from sponsored product optimization in ad system, including position-based de-biasing, click-conversion multi-task learning, and calibration on predicted click-through-rate (pCTR). We propose a practical machine learning framework that provides the solutions to such problems without structural change to existing machine learning models, thus can be combined with most machine learning models including shallow models (e.g. gradient boosting decision trees, support vector machines). In this paper, we first propose data and feature engineering techniques to handle the aforementioned problems in ad system; after that, we evaluate the benefit of our practical framework on real-world data sets from our traffic logs from online shopping site. We show that our proposed practical framework with data and feature engineering can also handle the perennial problems in ad systems and bring increments to multiple evaluation metrics.
PaRoT: A Practical Framework for Robust Deep Neural Network Training
Ayers, Edward, Eiras, Francisco, Hawasly, Majd, Whiteside, Iain
Deep Neural Networks (DNNs) are finding important applications in safety-critical systems such as Autonomous Vehicles (AVs), where perceiving the environment correctly and robustly is necessary for safe operation. Raising unique challenges for assurance due to their black-box nature, DNNs pose a fundamental problem for regulatory acceptance of these types of systems. Robust training --- training to minimize excessive sensitivity to small changes in input --- has emerged as one promising technique to address this challenge. However, existing robust training tools are inconvenient to use or apply to existing codebases and models: they typically only support a small subset of model elements and require users to extensively rewrite the training code. In this paper we introduce a novel framework, PaRoT, developed on the popular TensorFlow platform, that greatly reduces the barrier to entry. Our framework enables robust training to be performed on arbitrary DNNs without any rewrites to the model. We demonstrate that our framework's performance is comparable to prior art, and exemplify its ease of use on off-the-shelf, trained models and on a real-world industrial application: training a robust traffic light detection network.
A Practical Framework for Robust Decision-Theoretic Planning and Execution for Service Robots
Iocchi, Luca (Sapienza University of Rome) | Jeanpierre, Laurent (University of Caen Lower-Normandy) | Lazaro, Maria Teresa (Sapienza University of Rome) | Mouaddib, Abdel-Illah (University of Caen Lower-Normandy)
The deployment of robots in populated environments is recently gaining more interest because of increased maturity and capability of this technology. In this context, sophisticated planning techniques are required because there is a need of increasing the complexity of the tasks that the robot can accomplish. In particular, there is a large emphasis on service robots, i.e., robots that can satisfy several user needs. In this paper, we present a practical framework based on a decision-theoretic formalism for generation and execution of robust plans for service robots. The proposed framework has been implemented and succesfully tested on service robots interacting with non-expert users in public environments, facing many sources of uncertainty and failures in task execution.