Layer Probing Improves Kinase Functional Prediction with Protein Language Models

Kumar, Ajit, Jha, IndraPrakash

arXiv.org Artificial Intelligence 

Protein language models (PLMs) have transformed sequence-based protein analysis, yet most applications rely only on final-layer embeddings, which may overlook biologically meaningful information encoded in earlier layers. We systematically evaluate all 33 layers of ESM-2 for kinase functional prediction using both unsupervised clustering and supervised classification. We show that mid-to-late transformer layers (layers 20-33) outperform the final layer by 32 percent in unsupervised Adjusted Rand Index and improve homology-aware supervised accuracy to 75.7 percent. Domain-level extraction, calibrated probability estimates, and a reproducible benchmarking pipeline further strengthen reliability. Our results demonstrate that transformer depth contains functionally distinct biological signals and that principled layer selection significantly improves kinase function prediction.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found