The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMS