A Benchmark and Robustness Study of In-Context-Learning with Large Language Models in Music Entity Detection