Speech Modelling Using Subspace and EM Techniques

Neural Information Processing Systems 

The speech waveform can be modelled as a piecewise-stationary linear stochastic state space system, and its parameters can be estimated using an expectation-maximisation (EM) algorithm. One problem is the ini(cid:173) tialisation of the EM algorithm. Standard initialisation schemes can lead to poor formant trajectories. But these trajectories however are impor(cid:173) tant for vowel intelligibility. The aim of this paper is to investigate the suitability of subspace identification methods to initialise EM.