The bag-of-frames approach: a not so sufficient model for urban soundscapes

Lagrange, Mathieu, Lafay, Grégoire, Defreville, Boris, Aucouturier, Jean-Julien

arXiv.org Machine Learning 

Further, recent psychoacoustical evidence suggest the approach bears some resemblance with human auditory processing for sound textures (McDermott et al., 2013; Nelken and de Cheveigné, 2013). In an influential 2007 article, Aucouturier, Defreville & Pachet (Aucouturier et al., 2007) applied a BOF model to categorize both polyphonic music and soundscapes. Their results showed that, while BOF was a meriting model for their polyphonic music dataset, it was spectacularly effective for soundscapes, reaching accuracies of 96%. The contrast, they interpreted, lied in differences in the temporal structure of both types of stimuli, with music being more formally organized and soundscapes more easily summarized by statistics. In a later companion study (Aucouturier and Defreville, 2009), they showed that soundscapes could be time-shuffled without altering listeners' perception of their acoustic similarity, while music could not. While more work was needed for music, the authors therefore concluded that BOF was a sufficient model to approximate human perception for soundscapes, practically ruling out the need to recognize the local acoustic events in a texture in order to identify it.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found