Speaker recognition is a well known and studied task in the speech processing domain. It has many applications, either for security or speaker adaptation of personal devices. In this paper, we present a new paradigm for automatic speaker recognition that we call Interactive Speaker Recognition (ISR). In this paradigm, the recognition system aims to incrementally build a representation of the speakers by requesting personalized utterances to be spoken in contrast to the standard text-dependent or text-independent schemes. To do so, we cast the speaker recognition task into a sequential decision-making problem that we solve with Reinforcement Learning. Using a standard dataset, we show that our method achieves excellent performance while using little speech signal amounts. This method could also be applied as an utterance selection mechanism for building speech synthesis systems.
TELLING a yellow taxi and a pair of binoculars apart is so easy most people could do it standing on their head. Not so for an artificial intelligence: flip the cab upside down and it sees binoculars. This is just one of dozens of examples that show AI is a lot worse at identifying objects by sight than many people realise.
Provides a comprehensive introduction to key issues and findings in object recognition in experimental, neural, computational, and applied domains. Emphasizes the problem of representation, exploring the issue of how 3-D objects should be encoded so as to efficiently recognize them from 2-D images. Second half focuses on face recognition, an ecologically important instance of the general object recognition problem. Describes experimental studies of human face recognition performance and recent attempts to mimic this ability in artificial computational systems.