Goto

Collaborating Authors

 greater flexibility


Reviews: Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

Neural Information Processing Systems

This work offers a clearly defined extension to TTS systems allowing to build good quality voices (even unseen ones during training of either component) from a few adaptation data-points. Authors do not seem to offer any truly new theoretical extension to "building blocks" of their system, which is based on known components proposed elsewhere (speaker encoder, synthesizer and vocoder are based on previously published models). However, their mutual combination is clever, well-engineered and allows building blocks to by independently estimated in either unsupervised (speaker encoder, where audio transcripts are not needed) or supervised (speech synthesizer) ways, on different corpora. This allows for greater flexibility, reducing at the same time requirements for large amounts of transcribed data for each of the components (i.e. Good points: - clear, fair and convincing experiments - trained and evaluated on public corpora, which greatly increases reproducibility (portion of the experiments is carried on proprietary data, but all have equivalent experiments constrained to publicly available data) Weak points: - it would probably make sense to investigate the additional adaptability in case one gets more data per speaker, it seems your system cannot easily leverage more than 10s of reference speech data Summary: this is a very good study on generating multi-speaker TTS systems from small amounts of target speaker data.


UK eases data mining laws to support flourishing AI industry

#artificialintelligence

The UK is set to ease data mining laws in a move designed to further boost its flourishing AI industry. We all know that data is vital to AI development. Tech giants are in an advantageous position due to either having existing large datasets or the ability to fund/pay for the data required. Most startups rely on mining data to get started. Europe has notoriously strict data laws.


More power, greater flexibility for AI at the edge in transport use and smart cities - IoT Now Transport

#artificialintelligence

AAEON, a specialist in artificial intelligence (AI) edge solutions, has released the BOXER-8251AI AI edge box PC, powered by NVIDIA Jetson Xavier NX. The BOXER-8251AI is said to offer greater performance and is more compact. The device is powered by the Jetson Xavier NX from NVIDIA. Featuring a six-core 64-bit ARM processor, it boasts 384 CUDA cores, 48 Tensor Cores, and two NVDLA engines capable of running multiple neural networks in parallel, delivering accelerated computing performance up to 21 TOPS. Built to bring dedicated AI processing to the edge, the system also features 8GB of LPDDR4 memory and 16GB of onboard eMMC memory that's expandable through the Micro-SD card slot.


Scribe Healthcare Interactive Includes Customizable Cloud Features for Greater Flexibility

AITopics Original Links

Scribe Healthcare Technologies Inc. has released a revolutionary speech recognition software called Scribe Interactive. Since physicians and healthcare personnel need to be adaptable to cost efficient ways of practicing healthcare, Scribe Interactive is one tool that can immediately generate transcription layout from a dictation. With Scribe Interactive, the built in Scribe's M*Modal's Speech Recognition Engine leverages existing voice profiles to accomplish this. Even without a recognized voice profile, Scribe Interactive allows users to create verbal snippets for efficiency. There are many added features to this tool.