I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at Pitch
–arXiv.org Artificial Intelligence
Growing research demonstrates that synthetic failure modes imply poor generalization. We compare commonly used audio-to-audio losses on a synthetic benchmark, measuring the pitch distance between two stationary sinusoids. The results are surprising: many have poor sense of pitch direction. These shortcomings are exposed using simple rank assumptions. Our task is trivial for humans but difficult for these audio distances, suggesting significant progress can be made in self-supervised audio learning by improving current losses.
arXiv.org Artificial Intelligence
Dec-9-2020
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- United States
- Utah > Salt Lake County
- Salt Lake City (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Santa Clara County
- Sunnyvale (0.04)
- Arizona > Maricopa County
- Phoenix (0.04)
- Utah > Salt Lake County
- Puerto Rico > San Juan
- San Juan (0.04)
- Canada
- United States
- Europe
- Asia
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Oceania > Australia
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Speech (1.00)
- Natural Language (1.00)
- Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence