Goto

Collaborating Authors

 Media


Google's voice AI is more human than ever before

#artificialintelligence

You might have watched a movie like The Terminator or I, Robot and considered that the artificial intelligence potential it portrays is a far cry from our current technologies (there's no real fear of bots powered by Samsung Bixby overtaking the planet, that's for sure). After investigating a recently published Google research paper (via Quartz), it looks like we might be closer to this reality than you might think. The paper, titled "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions," highlights a new Google text-to-speech system called Tacotron 2, which is capable of a near-human level of AI voice reproduction. To achieve this, Tacotron 2 uses a pair of neural networks: one to create a visual representation of specific audio frequencies and a second (called "WaveNet") to recreate this visual data as sound. Google launched a website alongside the paper to show-off what this tech could lead to in practice; there, Google provides examples of how Tacotron 2 handles phrase semantics (like distinguishing between the noun and verb of "present"), intonation and difficult words that might trip some of us humans up like "otolaryngology."


Deep Optimization for Spectrum Repacking

Communications of the ACM

Over 13 months in 2016โ€“17 the U.S. Federal Communications Commission conducted an "incentive auction" to repurpose radio spectrum from broadcast television to wireless internet. In the end, the auction yielded $19.8 bn, $10.05 bn of which was paid to 175 broadcasters for voluntarily relinquishing their licenses across 14 Ultra High Frequency (UHF) channels. Stations that continued broadcasting were assigned potentially new channels to fit as densely as possible into the channels that remained. The government netted more than $7 bn (used to pay down the national debt) after covering costs (including retuning). A crucial element of the auction design was the construction of a solver, dubbed SAT-based Feasibility Checker (SATFC), that determined whether sets of stations could be "repacked" in this way; it needed to run every time a station was given a price quote.


Halide

Communications of the ACM

Writing high-performance code on modern machines requires not just locally optimizing inner loops, but globally reorganizing computations to exploit parallelism and locality--doing things such as tiling and blocking whole pipelines to fit in cache. This is especially true for image processing pipelines, where individual stages do much too little work to amortize the cost of loading and storing results to and from off-chip memory. As a result, the performance difference between a naive implementation of a pipeline and one globally optimized for parallelism and locality is often an order of magnitude. However, using existing programming tools, writing high-performance image processing code requires sacrificing simplicity, portability, and modularity. We argue that this is because traditional programming models conflate the computations defining the algorithm with decisions about intermediate storage and the order of computation, which we call the schedule. We propose a new programming language for image processing pipelines, called Halide, that separates the algorithm from its schedule. Programmers can change the schedule to express many possible organizations of a single algorithm. The Halide compiler then synthesizes a globally combined loop nest for an entire algorithm, given a schedule. Halide models a space of schedules which is expressive enough to describe organizations that match or outperform state-of-the-art hand-written implementations of many computational photography and computer vision algorithms. Its model is simple enough to do so often in only a few lines of code, and small changes generate efficient implementations for x86, ARM, Graphics Processors (GPUs), and specialized image processors, all from a single algorithm. Halide has been public and open source for over four years, during which it has been used by hundreds of programmers to deploy code to tens of thousands of servers and hundreds of millions of phones, processing billions of images every day. Computational photography and computer vision algorithms require highly efficient implementations to be used in practice, from power-constrained mobile devices to data centers processing billions of images. This is not a simple matter of programming in a low-level language such as C: even in C, the performance difference between naรฏve and highly optimized image processing code for the same algorithm is often an order of magnitude. Unfortunately, optimization usually comes at a large cost in programmer time and code complexity, as computation must be globally reorganized to efficiently exploit the memory system (locality, e.g., in caches) and many execution units (parallelism, e.g., across threads and vector lanes). Image processing pipelines are both wide and deep: they consist of many data-parallel stages that benefit hugely from parallel execution across pixels, but stages are often memory bandwidth limited--they do little work per load and store.


You don't need a PhD to grasp the anxieties around sex robots

Engadget

NSFW Warning: This story may contain links to and descriptions or images of explicit sexual acts. If you want to understand the myriad issues concerning sex robots that humanity needs to grapple with, you have two options. You can either spend several years studying for a PhD in either of those fields, or you can sit down in front of your TV. Many of the preoccupations that were on display at the third International Congress on Love and Sex with Robots are ones that have already been explored in pop culture. From Futurama to Westworld, going back to Weird Science and The Stepford Wives, the questions that academics are currently pondering have already been played out, fictionally at least, on TV.


Introduction to Natural Language Processing (NLP) - Algorithmia Blog

@machinelearnbot

The field of study that focuses on the interactions between human language and computers is called Natural Language Processing, or NLP for short. It sits at the intersection of computer science, artificial intelligence, and computational linguistics (Wikipedia). "Nat ur al Lan guage Pro cessing is a field that cov ers com puter un der stand ing and ma nip u la tion of hu man lan guage, and it's ripe with pos sib il it ies for news gath er ing," Anthony Pesce said in Natural Language Processing in the kitchen. "You usu ally hear about it in the con text of ana lyz ing large pools of legis la tion or other doc u ment sets, at tempt ing to dis cov er pat terns or root out cor rup tion." NLP is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way.


The Most-read WIRED Business Stories of 2017

WIRED

Looking back at the year's most-read WIRED business stories, one theme clearly emerges: people are very concerned with the future of work. Will the robot revolution will eradicate positions? What are the right skills for future-proofing ourselves? Other stories captured our readers attention too, of course. People seem to be equally concerned with how social media is retraining human brains and upending social norms.


The promise of AI in audio processing โ€“ Towards Data Science

#artificialintelligence

We have seen a rise of AI technologies for image and video processing. Even though things tend to take a little while longer making it to the world of audio, here we have also seen impressive technological advances. In this article, I will summarize some of these advances, outline further potentials of AI in audio processing as well as describe some of the possible pitfalls and challenges we might encounter in pursuing this cause. The kicker for my interest in AI use cases for audio processing was the publication of Google Deepmind's "WaveNet" -- A deep learning model for generating audio recordings [1] which was released during the end of 2016. Using an adapted network architecture, a dilated convolutional neural network, Deepmind researchers succeeded in generating very convincing text-to-speech and some interesting music-like recordings trained from classical piano recordings.


When Star Trek's Spock Met PLATO

WIRED

The first thing they noticed was the ears. They were just plain wrong. They seemed tiny--really, really tiny--when they were supposed to be big and pointy. That would be strike one against him: the absence of that universal trademark, his Vulcan ears. And what was he doing with that scruffy beard?


Flipboard on Flipboard

#artificialintelligence

There are more than 8,000 online courses out there. These are some of the best. More than 50 million students signed up for one this year. When scientists announce they've made a breakthrough, they usually promise we'll see the full effects of those discoveries--anything from a better understanding of how the universe works to a drug ready for use in patients--in about five years. TULSA -- Tom Coomer has retired twice: once when he was 65, and then several years ago.