Evaluating Multiple Instance Learning Strategies for Automated Sebocyte Droplet Counting

Adelipour, Maryam, Carneiro, Gustavo, Kim, Jeongkwon

arXiv.org Artificial Intelligence 

Sebocytes are lipid - secreting cells whose differentiation is marked by the accumulation of intracellular lipid droplets, making their quantification a key readout in sebocyte biology. Manual counting is labor - intensive and subjective, motivating automated solutions. Here, we introduce a simple attention - based multiple instance learning (MIL) framework for sebocyte image analysis. Nile Red - stained sebocyte images were annotated into 14 classes according to droplet counts, expanded via data augmentation to ab out 50,000 cells. Two models were benchmarked: a baseline multi - layer perceptron (MLP) trained on aggregated patch - level counts, and an attention - based MIL model leveraging precomputed ResNet - 50 feature embeddings with trainable instance weighting. Experiments using five - fold cross - validation showed that the baseline MLP achieved more stable performance (mean MAE = 5.6) compared with the attention - based MIL, which was less consistent (mean MAE = 10.7) but occasionally superior in specific folds. The se findings indicate that simple bag - level aggregation provides a robust baseline for slide - level droplet counting, while attention - based MIL requires task - aligned pooling and regularization to fully realize its potential in sebocyte image analysis.