Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion