Learning Robot Manipulation from Audio World Models