Rethinking Audio-visual Synchronization for Active Speaker Detection