guitar
Paul McCartney on playing guitar with Paul Mescal: 'He knew it better than I did!'
Paul McCartney on playing guitar with Paul Mescal: 'He knew it better than I did!' Hey, I know you! exclaims Paul McCartney, gripping my hand as we walk into his office in central London. And while I'm realistic enough to know he doesn't really hold treasured memories of our previous encounters, I'm impressed by his ability to defuse the tension of Meeting A Beatle. We gather in Soho at lunchtime. Instead of Wild Honey Pie or Savoy Truffle, McCartney has opted for a simple bagel (topping: a terrifying blend of Marmite and hummus), which he prepared in a kitchenette next to his assistant's desk. As he eats, he scans a printed list of film titles - mainly vintage comedies - looking for something to play at his family movie night.
Taylor Swift files to trademark voice and image after AI concerns
Taylor Swift has applied to trademark her voice and appearance in an apparent attempt to protect herself from artificial intelligence impersonations. The pop superstar has lodged three trademark applications in the US - one using a photo of herself on stage during her Eras Tour, and the other two being audio clips of her introducing herself while promoting her last album. AI-generated versions of Swift have cropped up in various ways in recent years - from explicit images to a fake election ad in which she appeared to urge people to vote for Donald Trump. The move comes after actor Matthew McConaughey became the first celebrity to use trademark rules to attempt to protect his voice and image from AI misuse earlier this year . Trademark applications are a relatively new way for celebrities to combat the growing issue of AI rip-offs.
A Graph Engine for Guitar Chord-Tone Soloing Education
Keating, Matthew, Casey, Michael
We present a graph-based engine for computing chord tone soloing suggestions for guitar students. Chord tone soloing is a fundamental practice for improvising over a chord progression, where the instrumentalist uses only the notes contained in the current chord. This practice is a building block for all advanced jazz guitar theory but is difficult to learn and practice. First, we discuss methods for generating chord-tone arpeggios. Next, we construct a weighted graph where each node represents a chord tone arpeggio for a chord in the progression. Then, we calculate the edge weight between each consecutive chord's nodes in terms of optimal transition tones. We then find the shortest path through this graph and reconstruct a chord-tone soloing line. Finally, we discuss a user-friendly system to handle input and output to this engine for guitar students to practice chord tone soloing.
A Machine Learning Approach for MIDI to Guitar Tablature Conversion
Kaliakatsos-Papakostas, Maximos, Bastas, Gregoris, Makris, Dimos, Herremans, Dorien, Katsouros, Vassilis, Maragos, Petros
Guitar tablature transcription consists in deducing the string and the fret number on which each note should be played to reproduce the actual musical part. This assignment should lead to playable string-fret combinations throughout the entire track and, in general, preserve parsimonious motion between successive combinations. Throughout the history of guitar playing, specific chord fingerings have been developed across different musical styles that facilitate common idiomatic voicing combinations and motion between them. This paper presents a method for assigning guitar tablature notation to a given MIDI-based musical part (possibly consisting of multiple polyphonic tracks), i.e. no information about guitar-idiomatic expressional characteristics is involved (e.g. bending etc.) The current strategy is based on machine learning and requires a basic assumption about how much fingers can stretch on a fretboard; only standard 6-string guitar tuning is examined. The proposed method also examines the transcription of music pieces that was not meant to be played or could not possibly be played by a guitar (e.g. potentially a symphonic orchestra part), employing a rudimentary method for augmenting musical information and training/testing the system with artificial data. The results present interesting aspects about what the system can achieve when trained on the initial and augmented dataset, showing that the training with augmented data improves the performance even in simple, e.g. monophonic, cases. Results also indicate weaknesses and lead to useful conclusions about possible improvements.
GOAT: A Large Dataset of Paired Guitar Audio Recordings and Tablatures
Loth, Jackson, Sarmento, Pedro, Sarkar, Saurjya, Guo, Zixun, Barthet, Mathieu, Sandler, Mark
In recent years, the guitar has received increased attention from the music information retrieval (MIR) community driven by the challenges posed by its diverse playing techniques and sonic characteristics. Mainly fueled by deep learning approaches, progress has been limited by the scarcity and limited annotations of datasets. To address this, we present the Guitar On Audio and Tablatures (GOAT) dataset, comprising 5.9 hours of unique high-quality direct input audio recordings of electric guitars from a variety of different guitars and players. We also present an effective data augmentation strategy using guitar amplifiers which delivers near-unlimited tonal variety, of which we provide a starting 29.5 hours of audio. Each recording is annotated using guitar tablatures, a guitar-specific symbolic format supporting string and fret numbers, as well as numerous playing techniques. For this we utilise both the Guitar Pro format, a software for tablature playback and editing, and a text-like token encoding. Furthermore, we present competitive results using GOAT for MIDI transcription and preliminary results for a novel approach to automatic guitar tablature transcription. We hope that GOAT opens up the possibilities to train novel models on a wide variety of guitar-related MIR tasks, from synthesis to transcription to playing technique detection.
SemanticControl: A Training-Free Approach for Handling Loosely Aligned Visual Conditions in ControlNet
Joung, Woosung, Chae, Daewon, Kim, Jinkyu
ControlNet has enabled detailed spatial control in text-to-image diffusion models by incorporating additional visual conditions such as depth or edge maps. However, its effectiveness heavily depends on the availability of visual conditions that are precisely aligned with the generation goal specified by text prompt-a requirement that often fails in practice, especially for uncommon or imaginative scenes. For example, generating an image of a cat cooking in a specific pose may be infeasible due to the lack of suitable visual conditions. In contrast, structurally similar cues can often be found in more common settings-for instance, poses of humans cooking are widely available and can serve as rough visual guides. Unfortunately, existing ControlNet models struggle to use such loosely aligned visual conditions, often resulting in low text fidelity or visual artifacts. To address this limitation, we propose SemanticControl, a training-free method for effectively leveraging misaligned but semantically relevant visual conditions. Our approach adaptively suppresses the influence of the visual condition where it conflicts with the prompt, while strengthening guidance from the text. The key idea is to first run an auxiliary denoising process using a surrogate prompt aligned with the visual condition (e.g., "a human playing guitar" for a human pose condition) to extract informative attention masks, and then utilize these masks during the denoising of the actual target prompt (e.g., cat playing guitar). Experimental results demonstrate that our method improves performance under loosely aligned conditions across various conditions, including depth maps, edge maps, and human skeletons, outperforming existing baselines. Our code is available at https://mung3477.github.io/semantic-control.
Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation
Zhou, Jinxing, Zhou, Yanghao, Han, Mingfei, Wang, Tong, Chang, Xiaojun, Cholakkal, Hisham, Anwer, Rao Muhammad
Referring Audio-Visual Segmentation (Ref-AVS) aims to segment target objects in audible videos based on given reference expressions. Prior works typically rely on learning latent embeddings via multimodal fusion to prompt a tunable SAM/SAM2 decoder for segmentation, which requires strong pixel-level supervision and lacks interpretability. From a novel perspective of explicit reference understanding, we propose TGS-Agent, which decomposes the task into a Think-Ground-Segment process, mimicking the human reasoning procedure by first identifying the referred object through multimodal analysis, followed by coarse-grained grounding and precise segmentation. To this end, we first propose Ref-Thinker, a multimodal language model capable of reasoning over textual, visual, and auditory cues. We construct an instruction-tuning dataset with explicit object-aware think-answer chains for Ref-Thinker fine-tuning. The object description inferred by Ref-Thinker is used as an explicit prompt for Grounding-DINO and SAM2, which perform grounding and segmentation without relying on pixel-level supervision. Additionally, we introduce R\textsuperscript{2}-AVSBench, a new benchmark with linguistically diverse and reasoning-intensive references for better evaluating model generalization. Our approach achieves state-of-the-art results on both standard Ref-AVSBench and proposed R\textsuperscript{2}-AVSBench. Code will be available at https://github.com/jasongief/TGS-Agent.
Music Source Restoration
Zang, Yongyi, Dai, Zheqi, Plumbley, Mark D., Kong, Qiuqiang
We introduce Music Source Restoration (MSR), a novel task addressing the gap between idealized source separation and real-world music production. Current Music Source Separation (MSS) approaches assume mixtures are simple sums of sources, ignoring signal degradations employed during music production like equalization, compression, and reverb. MSR models mixtures as degraded sums of individually degraded sources, with the goal of recovering original, undegraded signals. Due to the lack of data for MSR, we present RawStems, a dataset annotation of 578 songs with unprocessed source signals organized into 8 primary and 17 secondary instrument groups, totaling 354.13 hours. To the best of our knowledge, RawStems is the first dataset that contains unprocessed music stems with hierarchical categories. We consider spectral filtering, dynamic range compression, harmonic distortion, reverb and lossy codec as possible degradations, and establish U-Former as a baseline method, demonstrating the feasibility of MSR on our dataset. We release the RawStems dataset annotations, degradation simulation pipeline, training code and pre-trained models to be publicly available.