Robust Feature Learning on Long-Duration Sounds for Acoustic Scene Classification