Conditional GAN for Enhancing Diffusion Models in Efficient and Authentic Global Gesture Generation from Audios