From Loop Nests to Silicon: Mapping AI Workloads onto AMD NPUs with MLIR-AIR