Modeling Natural Sounds with Modulation Cascade Processes

Turner, Richard, Sahani, Maneesh

Neural Information Processing Systems 

Natural sounds are structured on many time-scales. A typical segment of speech, for example, contains features that span four orders of magnitude: Sentences ( 1s); phonemes ( 0.1s); glottal pulses ( 0.01s); and formants ( 0.001s). The auditory system uses information from each of these time-scales to solve complicated tasks such as auditory scene analysis. One route toward understanding how auditory processing accomplishes this analysis is to build neuroscience-inspired algorithms which solve similar tasks and to compare the properties of these algorithms with properties of auditory processing. There is however a discord: Current machine-audition algorithms largely concentrate on the shorter time-scale structures in sounds, and the longer structures are ignored.