Goto

Collaborating Authors

 Media


Deep Representation-Decoupling Neural Networks for Monaural Music Mixture Separation

AAAI Conferences

Monaural source separation (MSS) aims to extract and reconstruct different sources from a single-channel mixture, which could facilitate a variety of applications such as chord recognition, pitch estimation and automatic transcription. In this paper, we study the problem of separating vocals and instruments from monaural music mixture. Existing works for monaural source separation either utilize linear and shallow models (e.g., non-negative matrix factorization), or do not explicitly address the coupling and tangling of multiple sources in original input signals, hence they do not perform satisfactorily in real-world scenarios. To overcome the above limitations, we propose a novel end-to-end framework for monaural music mixture separation called Deep Representation-Decoupling Neural Networks (DRDNN). DRDNN takes advantages of both traditional signal processing methods and popular deep learning models. For each input of music mixture, DRDNN converts it to a two-dimensional time-frequency spectrogram using short-time Fourier transform (STFT), followed by stacked convolutional neural networks (CNN) layers and long-short term memory (LSTM) layers to extract more condensed features. Afterwards, DRDNN utilizes a decoupling component, which consists of a group of multi-layer perceptrons (MLP), to decouple the features further into different separated sources. The design of decoupling component in DRDNN produces purified single-source signals for subsequent full-size restoration, and can significantly improve the performance of final separation. Through extensive experiments on real-world dataset, we prove that DRDNN outperforms state-of-the-art baselines in the task of monaural music mixture separation and reconstruction.


MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment

AAAI Conferences

Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, and thereby introducing a chronological ordering of notes is not naturally suitable. In this paper, we propose three models for symbolic multi-track music generation under the framework of generative adversarial networks (GANs). The three models, which differ in the underlying assumptions and accordingly the network architectures, are referred to as the jamming model, the composer model and the hybrid model. We trained the proposed models on a dataset of over one hundred thousand bars of rock music and applied them to generate piano-rolls of five tracks: bass, drums, guitar, piano and strings. A few intra-track and inter-track objective metrics are also proposed to evaluate the generative results, in addition to a subjective user study. We show that our models can generate coherent music of four bars right from scratch (i.e. without human inputs). We also extend our models to human-AI cooperative music generation: given a specific track composed by human, we can generate four additional tracks to accompany it. All code, the dataset and the rendered audio samples are available at https://salu133445.github.io/musegan/.


Go With the Flow, on Jupiter and Snow. Coherence From Model-Free Video Data without Trajectories

arXiv.org Machine Learning

Viewing a data set such as the clouds of Jupiter, coherence is readily apparent to human observers, especially the Great Red Spot, but also other great storms and persistent structures. There are now many different definitions and perspectives mathematically describing coherent structures, but we will take an image processing perspective here. We describe an image processing perspective inference of coherent sets from a fluidic system directly from image data, without attempting to first model underlying flow fields, related to a concept in image processing called motion tracking. In contrast to standard spectral methods for image processing which are generally related to a symmetric affinity matrix, leading to standard spectral graph theory, we need a not symmetric affinity which arises naturally from the underlying arrow of time. We develop an anisotropic, directed diffusion operator corresponding to flow on a directed graph, from a directed affinity matrix developed with coherence in mind, and corresponding spectral graph theory from the graph Laplacian. Our methodology is not offered as more accurate than other traditional methods of finding coherent sets, but rather our approach works with alternative kinds of data sets, in the absence of vector field. Our examples will include partitioning the weather and cloud structures of Jupiter, and a local to Potsdam, N.Y. lake-effect snow event on Earth, as well as the benchmark test double-gyre system.


Police in China are scanning travelers with facial recognition glasses

Engadget

Police in China are now sporting glasses equipped with facial recognition devices and they're using them to scan train riders and plane passengers for individuals who may be trying to avoid law enforcement or are using fake IDs. So far, police have caught seven people connected to major criminal cases and 26 who were using false IDs while traveling, according to People's Daily. The Wall Street Journal reports that Beijing-based LLVision Technology Co. developed the devices. The company produces wearable video cameras as well and while it sells those to anyone, it's vetting buyers for its facial recognition devices. LLVision says that in tests, the system was able to pick out individuals from a database of 10,000 people and it could do so in 100 milliseconds.


What Is Project Yeti? Google Working On Live Stream Gaming Service, Console Project

International Business Times

Google seems to be working on a subscription-based game streaming service called "Yeti," according to Wednesday report by The Information. Google could also launch a gaming console under its Made by Google department, sources familiar with the matter told the news site. The subscription game service could work on Google's Chromecast and possibly with the rumored console. The project has gone through multiple iterations, including one that would have worked with the Chromecast, the report said. The Made by Google console would heighten Google's push for centering its products in consumers' homes.


Reddit bans the 'deepfake' AI porn it helped spawn

Engadget

Hot on the heels of Twitter, Reddit has updated its rules to expressly ban AI-generated "deepfake" porn. Where it previously had a single rule forbidding porn and suggestive material involving minors, it now has two -- and it's clear that you're not allowed to post "depictions that have been faked." Accordingly, Reddit has cracked down on some of the offending communities. It has shut down the deepfakes subreddit that got the ball rolling, as well as YouTubefakes. It hasn't closed non-deepfake subreddits like CelebFakes, however, and it's also maintaining the communities with more innocuous intentions, such as FakeApp (the program itself) and SFWdeepfakes. At the moment, this is more about addressing the specific violations that triggered the uproar than to stamp out every potential violation of the policy.


Top Tech Trends Manufacturers Need to Watch in 2018 Rootstock Software

#artificialintelligence

An old episode of The Simpsons predicted how smartphones would someday need to self-correct annoying spelling mishaps on the phone's keyboard. "Lisa on Ice" – a Season 6, Episode 8 show which aired way back in 1994- opens in a Springfield Elementary School assembly where Kearney asks fellow bully Dolph to take a memo on his Newton to "Beat up Martin." When the machine translates the message into "Eat up Martha," it is signaling how common text messaging errors can be blamed on their phone's lack of autocorrect technology. By 2013, Apple had perfected the autocorrect technology for smartphone keyboards. Nitin Ganatra, Apple's former director of engineering for iOS applications, explained "If you heard people talking and they used the words Eat up Martha, it was basically a reference to the fact that we needed to nail the keyboard. We needed to make sure the text input works on this thing otherwise- Here comes the Eat up Martha's."


I taught an AI to shave Henry Cavill's mustache

#artificialintelligence

Visit https://www.deepfakes.club to learn how you can start using these techniques with free software. The deepfakes algorithm is not just for face-swapping but can produce visual effects that would normally be quite costly to implement. This demo showcases the mustache-removal abilities of a trained neural network. Mustachegate involved actor Henry Cavill sporting a mustache during reshoots as Superman in the film Justice League. A competing studio would not allow him to shave his mustache.


Google is reportedly working on a video game streaming service

Engadget

It sounds like Google might be working on a game streaming service. According to a report from The Information, the tech juggernaut has been floating the idea for a streaming service (like PlayStation Now or NVIDIA's GeForce Now) for around two years. The service is codenamed "Yeti" and Google is apparently even testing hardware for it as well. The Information's sources say that the service might stream to a Chromecast, and that hiring Phil Harrison last month as VP of hardware -- formerly of Microsoft and Sony's gaming divisions -- could point toward a standalone gaming console. You probably shouldn't get your hopes up yet, though.


Meet Erica, Japan's Next Robot News Anchor

#artificialintelligence

At a mere 23 years old, Japan's latest news anchor would make her parents proud -- if she had any. Erica, a lifelike android designed to look like a 23-year-old woman, may soon become a TV news anchor in Japan, the Wall Street Journal reported. According to Hiroshi Ishiguro, director of the Intelligent Robotics Laboratory at Osaka Universityand Erica's creator, the android will replace a human news anchor on the airwaves as soon as April, the Daily Mail said. Erica the android may be well suited for this desk job. For starters, she can capably recite scripted writing and sit in a chair, making her about as qualified for television as most humans.