Who Said That? A Comparative Study of Non-Negative Matrix Factorisation and Deep Learning Techniques
Krikke, Teun F. (Heriot-Watt University) | Broz, Frank (Heriot-Watt University) | Lane, David (Heriot-Watt University)
When working with robots it is very important that the robot understands the user. This is more difficult when the user is only able to speak to it. You do not want a robot to call for milk when the user said call for help. It is possible for a robot to get a clear understanding of the user in a lab environment where there is no noise or reverberation to distort the instructions. However, in a normal setting this is not always the case. We concentrate on speaker separation to improve speech recognition. To do this we use non-negative matrix factorisation (NMF) and deep learning techniques. For training and testing these techniques, we introduce a new corpus that is recorded with a microphone array. In this paper, we use different NMF and deep learning techniques for the speaker separation. We found that adding directional information improves the separation when there is no noise or reverberation. However, when reverberation is present we saw that the NMF technique with the Itakura-Saito cost function out performs the other techniques. With deep learning we found that a recurrent neural networks is able to perform the separation of the speakers.
Oct-31-2017
- Technology: