Benchmarking Large Language Models for Image Classification of Marine Mammals