Deep Learning
This Artificially Intelligent Robot Composes and Performs Its Own Music
Shimon--a four-armed marimba playing robot--has been around for years, but its developers at Georgia Tech have recently taken this futuristic musical machine to the next level. Using deep learning, the robot can now study large datasets from well-known musicians, and then produce and perform its own original compositions. Shimon was originally developed by Gil Weinberg, director of Georgia Tech's Center for Music Technology. Under its original programming, the robot was capable of improvising music as it played alongside human performers, using an "interestingness" algorithm to make sure it wasn't just copying its bandmates. But now, thanks to the efforts of Ph.D. student Mason Bretan, Shimon has become an accomplished composer, capable of autonomously generating the melodic and harmonic structure of a song.
Understanding the limits of deep learning
Google replaced Google Translate's architecture with neural networks, and now machine translation is also closing in on human performance. At the recent AI By The Bay conference, Francois Chollet emphasized that deep learning is simply more powerful pattern recognition than previous statistical and machine learning methods. "The most important problem for AI today is abstraction and reasoning," explains Chollet, an AI researcher at Google and famed inventor of widely used deep learning library Keras. While neural networks achieve statistically impressive results across large sample sizes, they are "individually unreliable" and often make mistakes humans would never make, such as classifying a toothbrush as a baseball bat.
Microsoft AI gets maximum score possible on Ms. Pac-Man
Humans are now second-best at playing Ms. Pac-Man, a 1980s twist on the arcade classic, involving eating pellets and being chased by ghosts. It was rated as one of the hardest games for an AI to beat, but that didn't stop one. An AI from Microsoft's Maluuba team -- a Canadian deep learning startup the company acquired earlier this year -- has now scored the maximum score possible of 999,990 in the Atari game, beating the human record by four times. This was achieved using a method of reinforcement learning called Hybrid Reward Architecture. The team taught 150 AI agents to work together in parallel to master the game.
Sorry humans, Microsoft's AI is the first to reach a perfect Ms. Pac-Man score
At long last, the perfect score for arcade classic Ms. Pac-Man has been achieved, though not by a human. Maluuba -- a deep learning team acquired by Microsoft in January -- has created an AI system that's learned how to reach the game's maximum point value of 999,900 on Atari 2600, using a unique combination of reinforcement learning with a divide-and-conquer method. AI researchers have a documented penchant for using video games to test machine learning; they better mimic real-world chaos in a controlled environment versus more static games like chess. In 2015, Google's DeepMind AI was able to learn how to master 49 Atari games using reinforcement learning, which provides positive or negative feedback each time the AI attempts to solve a problem. Though AI has conquered a wealth of retro games, Ms. Pac-Man has remained elusive for years, due to the game's intentional lack of predictability.
A new trick for calculating Jacobian vector products
If you have any questions about this post please ask on the discussion thread on /r/machinelearning. For a solid introduction to Automatic Differentiation, which is the subject of this blog post, see Automatic differentiation in machine learning: a survey. Last week I was involved in a heated discussion thread over on the Autograd issue tracker. I'd recently been working on an implementation of forward mode automatic differentiation, which fits into Autograd's system for differentiating Python/Numpy code. Our discussion was about the usefulness of forward mode, which is equivalent to Theano's Rop and in the general case is used to calculate directional derivatives, or equivalently for calculating Jacobian vector products.
Canadian AI company raises historic amount of funding
Canada has developed into an artificial intelligence (AI) research powerhouse, and it isn't slowing down any time soon if new reports are any indication. Element AI, a Montreal-based AI company focused on enterprise solutions, announced that it has raised USD102 million (CAD137.5 million) in its first round of venture capital funding, the largest for any AI company in history. The funding will be used to ramp up research, accelerate enterprise AI adoption, invest in large-scale AI projects globally, and create 250 new jobs in the Canadian high tech sector by January 2018. "As we've been launching the business in the last year, we've been overwhelmed by the positive feedback and the level of interest in the market for the services we're working on," CEO and co-founder Jean-Franรงois Gagnรฉ tells IT World Canada. "We received funding from large corporations and startups across the globe and we're going to use this to scale up our business and work on new projects."
Robot Uses Deep Learning and Big Data to Write and Play its Own Music
Shimon, a four-armed, marimba playing robot, is writing and playing its own music using deep learning. This is the first of its two songs. A marimba-playing robot with four arms and eight sticks is writing and playing its own compositions in a lab at the Georgia Institute of Technology. The pieces are generated using artificial intelligence and deep learning. Researchers fed the robot nearly 5,000 complete songs -- from Beethoven to the Beatles to Lady Gaga to Miles Davis -- and more than 2 million motifs, riffs and licks of music.
Raw Waveform-based Speech Enhancement by Fully Convolutional Networks
Fu, Szu-Wei, Tsao, Yu, Lu, Xugang, Kawai, Hisashi
This study proposes a fully convolutional network (FCN) model for raw waveform-based speech enhancement. The proposed system performs speech enhancement in an end-to-end (i.e., waveform-in and waveform-out) manner, which dif-fers from most existing denoising methods that process the magnitude spectrum (e.g., log power spectrum (LPS)) only. Because the fully connected layers, which are involved in deep neural networks (DNN) and convolutional neural networks (CNN), may not accurately characterize the local information of speech signals, particularly with high frequency components, we employed fully convolutional layers to model the waveform. More specifically, FCN consists of only convolutional layers and thus the local temporal structures of speech signals can be efficiently and effectively preserved with relatively few weights. Experimental results show that DNN- and CNN-based models have limited capability to restore high frequency components of waveforms, thus leading to decreased intelligibility of enhanced speech. By contrast, the proposed FCN model can not only effectively recover the waveforms but also outperform the LPS-based DNN baseline in terms of short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ). In addition, the number of model parameters in FCN is approximately only 0.2% compared with that in both DNN and CNN.
One Model To Learn Them All
Kaiser, Lukasz, Gomez, Aidan N., Shazeer, Noam, Vaswani, Ashish, Parmar, Niki, Jones, Llion, Uszkoreit, Jakob
Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.
Deep Generative Models for Relational Data with Side Information
Hu, Changwei, Rai, Piyush, Carin, Lawrence
We present a probabilistic framework for overlapping community discovery and link prediction for relational data, given as a graph. The proposed framework has: (1) a deep architecture which enables us to infer multiple layers of latent features/communities for each node, providing superior link prediction performance on more complex networks and better interpretability of the latent features; and (2) a regression model which allows directly conditioning the node latent features on the side information available in form of node attributes. Our framework handles both (1) and (2) via a clean, unified model, which enjoys full local conjugacy via data augmentation, and facilitates efficient inference via closed form Gibbs sampling. Moreover, inference cost scales in the number of edges which is attractive for massive but sparse networks. Our framework is also easily extendable to model weighted networks with count-valued edges. We compare with various state-of-the-art methods and report results, both quantitative and qualitative, on several benchmark data sets.