Goto

Collaborating Authors

 pianist


Deconstructing Jazz Piano Style Using Machine Learning

Cheston, Huw, Bance, Reuben, Harrison, Peter M. C.

arXiv.org Artificial Intelligence

For a visual artist, their style might include aspects such as subject choice, colour choice, and brush techniques; for a writer, it might include vocabulary, syntactic constructions, and narrative archetypes; for a composer, it might include harmonic progressions, rhythmic patterns, and melodic motifs. Individual differences across all these parameters, and more, come together to define each artist's unique style. Most of these stylistic parameters can theoretically be assessed by human experts. However, such assessments are necessarily slow and hence hard to apply at scale. Subjectivity is also a problem, since every human analyst comes with their own history of artistic exposure that will inevitably affect how they interpret artworks. Computational methods promise a more scalable and objective approach to this problem. Once a researcher has crafted an algorithm that captures a particular stylistic parameter -- for example, using entropy to capture vocabulary complexity -- then a computer can easily apply the algorithm to large datasets, and hence compare different artists using this parameter (Abry et al., 2013; Cheston et al., 2024b; Deepaisarn et al., 2023; Li et al., 2012).


Robotic exoskeleton can train expert pianists to play faster

New Scientist

A robotic hand exoskeleton can help expert pianists learn to play even faster by moving their fingers for them. Robotic exoskeletons have long been used to rehabilitate people who can no longer use their hands through injury or disease, but using them to improve the abilities of able-bodied people has been less well explored. Now, Shinichi Furuya at Sony Computer Science Laboratories in Tokyo and his colleagues have found that a robotic exoskeleton can improve the finger speed of trained pianists after a single 30-minute training session. "I'm a pianist, but I [injured] my hand because of overpractising," says Furuya. "I was suffering from this dilemma, between overpractising and the prevention of the injury, so then I thought, I have to think about some way to improve my skills without practising."


PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making

Light, Jonathan, Xing, Sixue, Liu, Yuanzhe, Chen, Weiqin, Cai, Min, Chen, Xiusi, Wang, Guanzhi, Cheng, Wei, Yue, Yisong, Hu, Ziniu

arXiv.org Artificial Intelligence

Effective extraction of the world knowledge in LLMs for complex decision-making tasks remains a challenge. We propose a framework PIANIST for decomposing the world model into seven intuitive components conducive to zero-shot LLM generation. Given only the natural language description of the game and how input observations are formatted, our method can generate a working world model for fast and efficient MCTS simulation. We show that our method works well on two different games that challenge the planning and decision making skills of the agent for both language and non-language based action taking, without any training on domain-specific training data or explicitly defined world model.


RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning

Zakka, Kevin, Wu, Philipp, Smith, Laura, Gileadi, Nimrod, Howell, Taylor, Peng, Xue Bin, Singh, Sumeet, Tassa, Yuval, Florence, Pete, Zeng, Andy, Abbeel, Pieter

arXiv.org Artificial Intelligence

Replicating human-like dexterity in robot hands represents one of the largest open problems in robotics. Reinforcement learning is a promising approach that has achieved impressive progress in the last few years; however, the class of problems it has typically addressed corresponds to a rather narrow definition of dexterity as compared to human capabilities. To address this gap, we investigate piano-playing, a skill that challenges even the human limits of dexterity, as a means to test high-dimensional control, and which requires high spatial and temporal precision, and complex finger coordination and planning. We introduce RoboPianist, a system that enables simulated anthropomorphic hands to learn an extensive repertoire of 150 piano pieces where traditional model-based optimization struggles. We additionally introduce an open-sourced environment, benchmark of tasks, interpretable evaluation metrics, and open challenges for future study. Our website featuring videos, code, and datasets is available at https://kzakka.com/robopianist/


Reconstructing Human Expressiveness in Piano Performances with a Transformer Network

Tang, Jingjing, Wiggins, Geraint, Fazekas, Gyorgy

arXiv.org Artificial Intelligence

Capturing intricate and subtle variations in human expressiveness in music performance using computational approaches is challenging. In this paper, we propose a novel approach for reconstructing human expressiveness in piano performance with a multi-layer bi-directional Transformer encoder. To address the needs for large amounts of accurately captured and score-aligned performance data in training neural networks, we use transcribed scores obtained from an existing transcription model to train our model. We integrate pianist identities to control the sampling process and explore the ability of our system to model variations in expressiveness for different pianists. The system is evaluated through statistical analysis of generated expressive performances and a listening test. Overall, the results suggest that our method achieves state-of-the-art in generating human-like piano performances from transcribed scores, while fully and consistently reconstructing human expressiveness poses further challenges. Our codes are released at https://github.com/BetsyTang/RHEPP-Transformer.


JAZZVAR: A Dataset of Variations found within Solo Piano Performances of Jazz Standards for Music Overpainting

Row, Eleanor, Tang, Jingjing, Fazekas, George

arXiv.org Artificial Intelligence

Jazz pianists often uniquely interpret jazz standards. Passages from these interpretations can be viewed as sections of variation. We manually extracted such variations from solo jazz piano performances. The JAZZVAR dataset is a collection of 502 pairs of Variation and Original MIDI segments. Each Variation in the dataset is accompanied by a corresponding Original segment containing the melody and chords from the original jazz standard. Our approach differs from many existing jazz datasets in the music information retrieval (MIR) community, which often focus on improvisation sections within jazz performances. In this paper, we outline the curation process for obtaining and sorting the repertoire, the pipeline for creating the Original and Variation pairs, and our analysis of the dataset. We also introduce a new generative music task, Music Overpainting, and present a baseline Transformer model trained on the JAZZVAR dataset for this task. Other potential applications of our dataset include expressive performance analysis and performer identification.


We need to discuss what jobs robots should do, before the decision is made for us

Robohub

The social separation imposed by the pandemic led us to rely on technology to an extent we might never have imagined – from Teams and Zoom to online banking and vaccine status apps. Now, society faces an increasing number of decisions about our relationship with technology. For example, do we want our workforce needs fulfilled by automation, migrant workers, or an increased birth rate? In the coming years, we will also need to balance technological innovation with people's wellbeing – both in terms of the work they do and the social support they receive. And there is the question of trust. When humans should trust robots, and vice versa, is a question our Trust Node team is researching as part of the UKRI Trustworthy Autonomous Systems hub.


The Scariest Thing About em M3gan /em

Slate

This weekend, I succumbed to the pull of all the meme-y marketing and went to the theater to see the surprise horror-comedy hit M3gan. I generally enjoyed it--the jokes are funny, the jump scares effective, the robot-centric plot a rather smart addition to our fresh new wave of artificial intelligence anxiety. It isn't the goriest or most frightening flick--the blood streams had to stay PG-13--but the steadily paced tension and the references to horror classics do their job fine. Yet, to me, the most chilling aspect of the movie doesn't come from anything you might expect: the offscreen murders, M3gan's deranged humanoid face, the pressures of capitalism. It actually stems from a deceptively insignificant 10-second scene that comes about halfway through the movie, in which the titular bot takes to the house piano. To be clear, I don't find this scene so viscerally terrifying for the piano tune itself (in the film, a solid instrumental cover of Martika's 1989 No. 1 hit "Toy Soldiers"), or for the overall menace of the moment, a turning point in M3gan's development.


How much would you pay to use ChatGPT?

#artificialintelligence

ChatGPT, launched by OpenAI in late November 2022, is the new talk of the town. Everyone's raving about its user-friendliness and the mind blowing variety of its skills: it can both generate a fiction piece out of thin air and a functional Python script. We've seen people using it to write cover letters, school essays and political speeches. I even wrote a song called Crypto Winter. Two weeks ago, I subscribed to a Google Alert for ChatGPT and it's one of the longest notification emails I receive from the service every morning. Everyone, from writers to lawyers, developers and even politicians seems to be talking about ChatGPT.


RoMQA: A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering

Zhong, Victor, Shi, Weijia, Yih, Wen-tau, Zettlemoyer, Luke

arXiv.org Artificial Intelligence

We introduce RoMQA, the first benchmark for robust, multi-evidence, multi-answer question answering (QA). RoMQA contains clusters of questions that are derived from related constraints mined from the Wikidata knowledge graph. RoMQA evaluates robustness of QA models to varying constraints by measuring worst-case performance within each question cluster. Compared to prior QA datasets, RoMQA has more human-written questions that require reasoning over more evidence text and have, on average, many more correct answers. In addition, human annotators rate RoMQA questions as more natural or likely to be asked by people. We evaluate state-of-the-art large language models in zero-shot, few-shot, and fine-tuning settings, and find that RoMQA is challenging: zero-shot and few-shot models perform similarly to naive baselines, while supervised retrieval methods perform well below gold evidence upper bounds. Moreover, existing models are not robust to variations in question constraints, but can be made more robust by tuning on clusters of related questions. Our results show that RoMQA is a challenging benchmark for large language models, and provides a quantifiable test to build more robust QA methods.