Goto

Collaborating Authors

 specificity








Attack-Aware Noise Calibration for Differential Privacy

Neural Information Processing Systems

Differential privacy (DP) is a widely used approach for mitigating privacy risks when training machine learning models on sensitive data. DP mechanisms add noise during training to limit the risk of information leakage. The scale of the added noise is critical, as it determines the trade-off between privacy and utility. The standard practice is to select the noise scale to satisfy a given privacy budget ε. This privacy budget is in turn interpreted in terms of operational attack risks, such as accuracy, sensitivity, and specificity of inference attacks aimed to recoverinformation about the training data records.


Exploring Protein Language Model Architecture-Induced Biases for Antibody Comprehension

Mengren, null, Liu, null, Zhang, Yixiang, Yiming, null, Zhang, null

arXiv.org Artificial Intelligence

Recent advances in protein language models (PLMs) have demonstrated remarkable capabilities in understanding protein sequences. However, the extent to which different model architectures capture antibody-specific biological properties remains unexplored. In this work, we systematically investigate how architectural choices in PLMs influence their ability to comprehend antibody sequence characteristics and functions. We evaluate three state-of-the-art PLMs-AntiBERTa, BioBERT, and ESM2--against a general-purpose language model (GPT-2) baseline on antibody target specificity prediction tasks. Our results demonstrate that while all PLMs achieve high classification accuracy, they exhibit distinct biases in capturing biological features such as V gene usage, somatic hypermutation patterns, and isotype information. Through attention attribution analysis, we show that antibody-specific models like AntiBERTa naturally learn to focus on complementarity-determining regions (CDRs), while general protein models benefit significantly from explicit CDR-focused training strategies. These findings provide insights into the relationship between model architecture and biological feature extraction, offering valuable guidance for future PLM development in computational antibody design.


SweetDeep: A Wearable AI Solution for Real-Time Non-Invasive Diabetes Screening

Henriques, Ian, Elhassar, Lynda, Relekar, Sarvesh, Walrave, Denis, Hassantabar, Shayan, Ghanakota, Vishu, Laoui, Adel, Aich, Mahmoud, Tir, Rafia, Zerguine, Mohamed, Louafi, Samir, Kimouche, Moncef, Cosson, Emmanuel, Jha, Niraj K

arXiv.org Artificial Intelligence

The global rise in type 2 diabetes underscores the need for scalable and cost-effective screening methods. Current diagnosis requires biochemical assays, which are invasive and costly. Advances in consumer wearables have enabled early explorations of machine learning-based disease detection, but prior studies were limited to controlled settings. We present SweetDeep, a compact neural network trained on physiological and demographic data from 285 (diabetic and non-diabetic) participants in the EU and MENA regions, collected using Samsung Galaxy Watch 7 devices in free-living conditions over six days. Each participant contributed multiple 2-minute sensor recordings per day, totaling approximately 20 recordings per individual. Despite comprising fewer than 3,000 parameters, SweetDeep achieves 82.5% patient-level accuracy (82.1% macro-F1, 79.7% sensitivity, 84.6% specificity) under three-fold cross-validation, with an expected calibration error of 5.5%. Allowing the model to abstain on less than 10% of low-confidence patient predictions yields an accuracy of 84.5% on the remaining patients. These findings demonstrate that combining engineered features with lightweight architectures can support accurate, rapid, and generalizable detection of type 2 diabetes in real-world wearable settings.


Identifying environmental factors associated with tetrodotoxin contamination in bivalve mollusks using eXplainable AI

Schoppema, M. C., van der Velden, B. H. M., Hürriyetoğlu, A., Klijnstra, M. D., Faassen, E. J., Gerssen, A., van der Fels-Klerx, H. J.

arXiv.org Artificial Intelligence

Since 2012, tetrodotoxin (TTX) has been found in seafoods such as bivalve mollusks in temperate European waters. TTX contamination leads to food safety risks and economic losses, making early prediction of TTX contamination vital to the food industry and competent authorities. Recent studies have pointed to shallow habitats and water temperature as main drivers to TTX contamination in bivalve mollusks. However, the temporal relationships between abiotic factors, biotic factors, and TTX contamination remain unexplored. We have developed an explainable, deep learning-based model to predict TTX contamination in the Dutch Zeeland estuary. Inputs for the model were meteorological and hydrological features; output was the presence or absence of TTX contamination. Results showed that the time of sunrise, time of sunset, global radiation, water temperature, and chloride concentration contributed most to TTX contamination. Thus, the effective number of sun hours, represented by day length and global radiation, was an important driver for tetrodotoxin contamination in bivalve mollusks. To conclude, our explainable deep learning model identified the aforementioned environmental factors (number of sun hours, global radiation, water temperature, and water chloride concentration) to be associated with tetrodotoxin contamination in bivalve mollusks; making our approach a valuable tool to mitigate marine toxin risks for food industry and competent authorities.