Goto

Collaborating Authors

 drummer


3 things Will Douglas Heaven is into right now

MIT Technology Review

MIT Technology Review's senior editor for AI shares what he's been thinking about lately. My daughter introduced me to El Estepario Siberiano's YouTube channel a few months back, and I have been obsessed ever since. The Spanish drummer (real name: Jorge Garrido) posts videos of himself playing supercharged cover versions of popular tracks, hitting his drums with such jaw-dropping speed and technique that he makes other pro drummers shake their heads in disbelief. The dozens of reaction videos posted by other musicians are a joy in themselves. Garrido is up-front about the countless hours that it took to get this good. He says he sat behind his kit almost all day, every day for years.


DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation

Wang, Ming, Wang, Fang, Hu, Minghao, He, Li, Wang, Haiyang, Zhang, Jun, Yan, Tianwei, Li, Li, Luo, Zhunchen, Luo, Wei, Bai, Xiaoying, Geng, Guotong

arXiv.org Artificial Intelligence

Long-form article generation (LFAG) presents challenges such as maintaining logical consistency, comprehensive topic coverage, and narrative coherence across extended articles. Existing datasets often lack both the hierarchical structure and fine-grained annotation needed to effectively decompose tasks, resulting in shallow, disorganized article generation. To address these limitations, we introduce DeFine, a Decomposed and Fine-grained annotated dataset for long-form article generation. DeFine is characterized by its hierarchical decomposition strategy and the integration of domain-specific knowledge with multi-level annotations, ensuring granular control and enhanced depth in article generation. To construct the dataset, a multi-agent collaborative pipeline is proposed, which systematically segments the generation process into four parts: Data Miner, Cite Retreiver, Q&A Annotator and Data Cleaner. To validate the effectiveness of DeFine, we designed and tested three LFAG baselines: the web retrieval, the local retrieval, and the grounded reference. We fine-tuned the Qwen2-7b-Instruct model using the DeFine training dataset. The experimental results showed significant improvements in text quality, specifically in topic coverage, depth of information, and content fidelity. Our dataset publicly available to facilitate future research.


Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

Shao, Yijia, Jiang, Yucheng, Kanell, Theodore A., Xu, Peter, Khattab, Omar, Lam, Monica S.

arXiv.org Artificial Intelligence

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking. STORM models the pre-writing stage by (1) discovering diverse perspectives in researching the given topic, (2) simulating conversations where writers carrying different perspectives pose questions to a topic expert grounded on trusted Internet sources, (3) curating the collected information to create an outline. For evaluation, we curate FreshWiki, a dataset of recent high-quality Wikipedia articles, and formulate outline assessments to evaluate the pre-writing stage. We further gather feedback from experienced Wikipedia editors. Compared to articles generated by an outline-driven retrieval-augmented baseline, more of STORM's articles are deemed to be organized (by a 25% absolute increase) and broad in coverage (by 10%). The expert feedback also helps identify new challenges for generating grounded long articles, such as source bias transfer and over-association of unrelated facts.


Flat Latent Manifolds for Human-machine Co-creation of Music

Chen, Nutan, Benbouzid, Djalel, Ferroni, Francesco, Nitschke, Mathis, Pinna, Luciano, van der Smagt, Patrick

arXiv.org Artificial Intelligence

The use of machine learning in artistic music generation leads to controversial discussions of the quality of art, for which objective quantification is nonsensical. We therefore consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal interplay is to lead to new experiences, both for the musician and the audience. To obtain this behaviour, we resort to the framework of recurrent Variational Auto-Encoders (VAE) and learn to generate music, seeded by a human musician. In the learned model, we generate novel musical sequences by interpolation in latent space. Standard VAEs however do not guarantee any form of smoothness in their latent representation. This translates into abrupt changes in the generated music sequences. To overcome these limitations, we regularise the decoder and endow the latent space with a flat Riemannian manifold, i.e., a manifold that is isometric to the Euclidean space. As a result, linearly interpolating in the latent space yields realistic and smooth musical changes that fit the type of machine--musician interactions we aim for. We provide empirical evidence for our method via a set of experiments on music datasets and we deploy our model for an interactive jam session with a professional drummer. The live performance provides qualitative evidence that the latent representation can be intuitively interpreted and exploited by the drummer to drive the interplay. Beyond the musical application, our approach showcases an instance of human-centred design of machine-learning models, driven by interpretability and the interaction with the end user.


DrumNet

#artificialintelligence

Sony CSL Paris develops technology for AI-assisted music production. The goal is not to replace musicians, but to provide them with better tools to be more efficient in realizing their creative ideas. DrumNet is based on an artificial neural network which learns rhythmic relationships between different instruments and encodes these relationships in a 16-dimensional style space. A similar example is the Logic Pro X Drummer, allowing the user to specify the playing style by navigating a two-dimensional space. The difference of DrumNet to the Logic Pro X Drummer, however, is that it dynamically adapts to the existing music.


The music moves us -- but how?

#artificialintelligence

Music and dance are so deeply embedded in the human experience that we almost take them for granted. They're distinct from one another, but intimately related: Music -- arrangements of sound over time -- causes us to move our bodies in space. Without knowing it, we track pulse, tempo and rhythm, and we move in response. But only recently have scientists developed the tools, and the inclination, to quantitatively study the human response to music in its many forms. It's a research program that relies on a wide array of approaches, employing techniques from the study of perception and cognition to those of neurobiology and neuroimaging, with additional insights from psychophysics, evolutionary psychology and animal studies.


Sony's new AI drummer could write beats for your band

#artificialintelligence

We already knew AIs could write jazz and death metal tracks on their own, and even create infinite remixes of others' songs. Now, Sony has created an AI that can collaborate with other musicians, producing drum beats for songs in a variety of genres that you can listen to here -- making it the latest example of AI's ability to tap into its musical side. The Sony researchers calls the AI DrumNet, and according to a paper published on the preprint server arXiv, they trained it using 665 pop, rock, and electro songs, each featuring bass, kick drum, and snare drum tracks. After this training, the AI drummer could produce its own "musically plausible" kick drum beat for a song based on the other instruments in the track. The tracks DrumNet produces aren't anywhere near as creative as Keith Moon's or Questlove's, but they don't seem wildly out of place either, based on the examples shared by Sony.


Learning to Groove with Inverse Sequence Transformations

Gillick, Jon, Roberts, Adam, Engel, Jesse, Eck, Douglas, Bamman, David

arXiv.org Machine Learning

We explore models for translating abstract musical ideas (scores, rhythms) into expressive performances using Seq2Seq and recurrent Variational Information Bottleneck (VIB) models. Though Seq2Seq models usually require painstakingly aligned corpora, we show that it is possible to adapt an approach from the Generative Adversarial Network (GAN) literature (e.g. Pix2Pix (Isola et al., 2017) and Vid2Vid (Wang et al. 2018a)) to sequences, creating large volumes of paired data by performing simple transformations and training generative models to plausibly invert these transformations. Music, and drumming in particular, provides a strong test case for this approach because many common transformations (quantization, removing voices) have clear semantics, and models for learning to invert them have real-world applications. Focusing on the case of drum set players, we create and release a new dataset for this purpose, containing over 13 hours of recordings by professional drummers aligned with fine-grained timing and dynamics information. We also explore some of the creative potential of these models, including demonstrating improvements on state-of-the-art methods for Humanization (instantiating a performance from a musical score).


Musician Who Lost His Arm Plays Piano Again with AI Prosthesis

#artificialintelligence

A galaxy far, far away is a little closer with the invention of a robotic arm inspired by Luke Skywalker's bionic hand. And while this arm may not wield a lightsaber, it has a greater power for jazz musician Jason Barnes -- it lets him play the piano for the first time in five years. Barnes, who lost much of his right arm in a work accident, is back at the keys with an AI prosthesis created by researchers at the Georgia Institute of Technology. Unlike most prosthetics, it gives the 28-year-old the ability to control each finger individually. With it, Barnes can play Beethoven.


Humans can mimic machines, too; look out, AutoTune - CDM Create Digital Music

#artificialintelligence

As machines create more-perfect vocal and instrumental performances, a funny thing is happening: humans are catching up. The normal assumption about machine learning or "cyborg" technology is, as technology improves, we'll augment ourselves with more technology. But that misses the fact that humans, both individually and socially, are also smart and adaptable. We start to learn from the tech. I once met Stewart Copeland (The Police, composer), and he talked about this very phenomenon. A lot of the sound of The Police involved Stewart's playing routed through various effects.