On the effectiveness of multimodal privileged knowledge distillation in two vision transformer based diagnostic applications