Deep Learning
Speech Synthesis using Deep Learning
Voice assistance on your phone is nothing new. Be it Alice, Cortana or Siri, they have all been assisting us with minor chores through our otherwise busy life. You don't really need to look carefully to figure the monotony in the speech and hence one never banks fully on the assistant. Psychologically, a person has never found a'spark' of sorts with their assistant. Since most systems today need to be trained or taught by humans, it is almost impossible for us to pre-program an assistant who adapts to every consumer.
A night at the AI jazz club
It's a Wednesday night in North East London and upstairs at the Vortex Jazz Club the machines are calling the shots. The human spectators are jiggling happily in their seats, and the musicians are undeniably flesh-and-blood, sweating and straining at their instruments. But the music itself is the product of electronic brains -- trained to soak up the music of great artists and strain out new melodies. This is "the first concert consisting almost entirely of music composed by artificial intelligence" says professor Geraint Wiggins of Queen Mary's University at the beginning of the evening. In about a few minutes we'll be listening to Medieval chants, Baroque chorales, and jazz and pop -- all made by artificial intelligence with the help of computer scientists who programmed the evening's "composers."
Towards end-to-end optimisation of functional image analysis pipelines
Vilamala, Albert, Madsen, Kristoffer Hougaard, Hansen, Lars Kai
The study of neurocognitive tasks requiring accurate localisation of activity often rely on functional Magnetic Resonance Imaging, a widely adopted technique that makes use of a pipeline of data processing modules, each involving a variety of parameters. These parameters are frequently set according to the local goal of each specific module, not accounting for the rest of the pipeline. Given recent success of neural network research in many different domains, we propose to convert the whole data pipeline into a deep neural network, where the parameters involved are jointly optimised by the network to best serve a common global goal. As a proof of concept, we develop a module able to adaptively apply the most suitable spatial smoothing to every brain volume for each specific neuroimaging task, and we validate its results in a standard brain decoding experiment.
Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder
Hsu, Chin-Cheng, Hwang, Hsin-Te, Wu, Yi-Chiao, Tsao, Yu, Wang, Hsin-Min
We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit the scope of practical applications of SC due to scarcity or even unavailability of parallel corpora. We propose an SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora. The framework comprises an encoder that learns speaker-independent phonetic representations and a decoder that learns to reconstruct the designated speaker. It removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system. We report objective and subjective evaluations to validate our proposed method and compare it to SC methods that have access to aligned corpora.
A Survey of Voice Translation Methodologies - Acoustic Dialect Decoder
Krupakar, Hans, Rajvel, Keerthika, B, Bharathi, S, Angel Deborah, Krishnamurthy, Vallidevi
Speech Translation has always been about giving source text or audio input and waiting for system to give translated output in desired form. In this paper, we present the Acoustic Dialect Decoder (ADD) - a voice to voice ear-piece translation device. We introduce and survey the recent advances made in the field of Speech Engineering, to employ in the ADD, particularly focusing on the three major processing steps of Recognition, Translation and Synthesis. We tackle the problem of machine understanding of natural language by designing a recognition unit for source audio to text, a translation unit for source language text to target language text, and a synthesis unit for target language text to target language speech. Speech from the surroundings will be recorded by the recognition unit present on the ear-piece and translation will start as soon as one sentence is successfully read. This way, we hope to give translated output as and when input is being read. The recognition unit will use Hidden Markov Models (HMMs) Based Tool-Kit (HTK), hybrid RNN systems with gated memory cells, and the synthesis unit, HMM based speech synthesis system HTS. This system will initially be built as an English to Tamil translation device.
Accelerate Monte Carlo Simulations with Restricted Boltzmann Machines
Beijing National Lab for Condensed Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China Despite their exceptional flexibility and popularity, the Monte Carlo methods often suffer from slow mixing times for challenging statistical physics problems. We present a general strategy to overcome this difficulty by adopting ideas and techniques from the machine learning community. We fit the unnormalized probability of the physical model to a feedforward neural network and reinterpret the architecture as a restricted Boltzmann machine. Then, exploiting its feature detection ability, we utilize the restricted Boltzmann machine for efficient Monte Carlo updates and to speed up the simulation of the original physical system. We implement these ideas for the Falicov-Kimball model and demonstrate improved acceptance ratio and autocorrelation time near the phase transition point. Monte Carlo method is one of the most flexible and powerful methods for studying many-body systems [1, 2]. Monte Carlo methods randomly sample configurations and obtain the answer as a statistical average.
AI accountability needs action now, say UK MPs
A UK parliamentary committee has urged the government to act proactively -- and to act now -- to tackle "a host of social, ethical and legal questions" arising from growing usage of autonomous technologies such as artificial intelligence. "While it is too soon to set down sector-wide regulations for this nascent field, it is vital that careful scrutiny of the ethical, legal and societal dimensions of artificially intelligent systems begins now," says the committee. "Not only would this help to ensure that the UK remains focused on developing'socially beneficial' AI systems, it would also represent an important step towards fostering public dialogue about, and trust in, such systems over time." The committee kicked off an enquiry into AI and robotics this March, going on to take 67 written submissions and hear from 12 witnesses in person, in addition to visiting Google DeepMind's London office. Publishing its report into robotics and AI today, the Science and Technology committee flags up several issues that it says need "serious, ongoing consideration" -- including: "[W]itnesses were clear that the ethical and legal matters raised by AI deserved attention now and that suitable governance frameworks were needed," it notes in the report.
Google's secretive DeepMind AI learns to navigate the London Underground
With 270 stations across 11 lines, navigating your way through the London Underground can be a difficult task. But Google's DeepMind artificial intelligence lab has taught a machine to navigate the intricate system on its own - something many tourists to the capital fail to master. The system combines both data processing with self-learning code to navigate the system using human-like memory and reason. While the task of navigating the London Underground is fairly simple, the way in which the new system learned the system is innovative. The system works by combining an external memory with deep-learning โ allowing the programme to learn on its own rather than being aided by a human.
DeepMind's AI has learned to navigate the Tube using memory
DeepMind's latest AI has a "working memory" so that it can learn how to solve tasks for itself โ such as how best to get from A to B on the London tube network. "The thing can learn to compute what it has to, rather than being programmed," says Murray Shanahan at Imperial College, London, who wasn't involved with the work. Called a Differentiable Neural Computer (DNC), the system succeeds because it combines neural networks, which are good at learning but not so good at storing data, with an external memory. It can retrieve items from its memory in the order they were recorded โ a key innovation that ensures they don't get overwritten too quickly and helps the system tackle complicated data it hasn't seen before. The DNC works out how to interpret a data set on its own, following some basic training on random graphs.