Toward Robust and Efficient ML-Based GPU Caching for Modern Inference