Goto

Collaborating Authors

 Media


A Pro-Russia Disinformation Campaign Is Using Free AI Tools to Fuel a 'Content Explosion'

WIRED

A pro-Russia disinformation campaign is leveraging consumer artificial intelligence tools to fuel a "content explosion" focused on exacerbating existing tensions around global elections, Ukraine, and immigration, among other controversial issues, according to new research published last week. The campaign, known by many names including Operation Overload and Matryoshka (other researchers have also tied it to Storm-1679), has been operating since 2023 and has been aligned with the Russian government by multiple groups, including Microsoft and the Institute for Strategic Dialogue. While the campaign targets audiences around the world, including in the US, its main target has been Ukraine. Hundreds of AI-manipulated videos from the campaign have tried to fuel pro-Russian narratives. The report outlines how, between September 2024 and May 2025, the amount of content being produced by those running the campaign has increased dramatically and is receiving millions of views around the world.


The best portable Bluetooth speakers for 2025, tested and reviewed

Popular Science

We may earn revenue from the products available on this page and participate in affiliate programs. Let's face it: Your phone's built-in sound sucks, so you need a portable Bluetooth speaker. Sure, everything is relative, and those phone speakers are amazing compared to what, say, a 2005 flip phone sounded like. But do we really want to justify our tech based on when people published think-pieces on how texting was the new hotness? So while we can admit you can hear musical cues right out of your pocket, if you want to feel the actual emotional resonance that makes the music special, the speakers on even the best smartphone, the best tablet, the best laptop โ€ฆ ultimately suck. But the best portable Bluetooth speakers--from the compact Bose SoundLink Plus to the more substantial Brane X, for example--do not suck, so we're ready to help you select the right speaker for any situation. We test a lot of Bluetooth speakers throughout the year, giving us deep insight into what's on the marketplace and what's worth your money. Whether you're looking for something budget or audiophile, chances are we've heard at least one model from whatever brand you're considering. We combine these experiences with other users' impressions, then top it all off with extensive research on what you should be looking for: IP rating, frequency range, battery life, Bluetooth range โ€ฆ we've got you! This lets us find the perfect balance of specs and special features from a fairly dense pool of possibilities. From extreme durability to supreme connectivity, we've got you covered when it comes to the best portable Bluetooth speakers. Whether you're always on the go or simply need something to take to the front porch, these speakers will deliver quality sound without any cables or wires weighing you down. Why it made the cut: The Bose SoundLink Plus portable Bluetooth speaker is styled for motion, tuned for emotion, with high cost being the primary shortcoming. New for 2025, the 269 SoundLink Plus is built with a powder-coated steel grille and a shock-resistant chassis wrapped in color-matched silicone.


Cloudflare Is Blocking AI Crawlers by Default

WIRED

Last year, internet infrastructure firm Cloudflare launched tools enabling its customers to block AI scrapers. Today the company has taken its fight against permissionless scraping several steps further. It has switched to blocking AI crawlers by default for its customers and is moving forward with a Pay Per Crawl program that lets customers charge AI companies to scrape their websites. Web crawlers have trawled the internet for information for decades. Without them, people would lose vitally important online tools, from Google Search to the Internet Archive's invaluable digital preservation work.


TOMI: Transforming and Organizing Music Ideas for Multi-Track Compositions with Full-Song Structure

arXiv.org Artificial Intelligence

Hierarchical planning is a powerful approach to model long sequences structurally. Aside from considering hierarchies in the temporal structure of music, this paper explores an even more important aspect: concept hierarchy, which involves generating music ideas, transforming them, and ultimately organizing them--across musical time and space--into a complete composition. To this end, we introduce TOMI (Transforming and Organizing Music Ideas) as a novel approach in deep music generation and develop a TOMI-based model via instruction-tuned foundation LLM. Formally, we represent a multi-track composition process via a sparse, four-dimensional space characterized by clips (short audio or MIDI segments), sections (temporal positions), tracks (instrument layers), and transformations (elaboration methods). Our model is capable of generating multi-track electronic music with full-song structure, and we further integrate the TOMI-based model with the REAPER digital audio workstation, enabling interactive human-AI co-creation. Experimental results demonstrate that our approach produces higher-quality electronic music with stronger structural coherence compared to baselines.


Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate

arXiv.org Artificial Intelligence

Remarkable progress in text-to-image diffusion models has brought a major concern about potentially generating images on inappropriate or trademarked concepts. Concept erasing has been investigated with the goals of deleting target concepts in diffusion models while preserving other concepts with minimal distortion. To achieve these goals, recent concept erasing methods usually fine-tune the cross-attention layers of diffusion models. In this work, we first show that merely updating the cross-attention layers in diffusion models, which is mathematically equivalent to adding \emph{linear} modules to weights, may not be able to preserve diverse remaining concepts. Then, we propose a novel framework, dubbed Concept Pinpoint Eraser (CPE), by adding \emph{nonlinear} Residual Attention Gates (ResAGs) that selectively erase (or cut) target concepts while safeguarding remaining concepts from broad distributions by employing an attention anchoring loss to prevent the forgetting. Moreover, we adversarially train CPE with ResAG and learnable text embeddings in an iterative manner to maximize erasing performance and enhance robustness against adversarial attacks. Extensive experiments on the erasure of celebrities, artistic styles, and explicit contents demonstrated that the proposed CPE outperforms prior arts by keeping diverse remaining concepts while deleting the target concepts with robustness against attack prompts. Code is available at https://github.com/Hyun1A/CPE


CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following

arXiv.org Artificial Intelligence

Recent advances in audio-text large language models (LLMs) have opened new possibilities for music understanding and generation. However, existing benchmarks are limited in scope, often relying on simplified tasks or multi-choice evaluations that fail to reflect the complexity of real-world music analysis. We reinterpret a broad range of traditional MIR annotations as instruction-following formats and introduce CMI-Bench, a comprehensive music instruction following benchmark designed to evaluate audio-text LLMs on a diverse set of music information retrieval (MIR) tasks. These include genre classification, emotion regression, emotion tagging, instrument classification, pitch estimation, key detection, lyrics transcription, melody extraction, vocal technique recognition, instrument performance technique detection, music tagging, music captioning, and (down)beat tracking: reflecting core challenges in MIR research. Unlike previous benchmarks, CMI-Bench adopts standardized evaluation metrics consistent with previous state-of-the-art MIR models, ensuring direct comparability with supervised approaches. We provide an evaluation toolkit supporting all open-source audio-textual LLMs, including LTU, Qwen-audio, SALMONN, MusiLingo, etc. Experiment results reveal significant performance gaps between LLMs and supervised models, along with their culture, chronological and gender bias, highlighting the potential and limitations of current models in addressing MIR tasks. CMI-Bench establishes a unified foundation for evaluating music instruction following, driving progress in music-aware LLMs.


Investigating an Overfitting and Degeneration Phenomenon in Self-Supervised Multi-Pitch Estimation

arXiv.org Artificial Intelligence

Multi-Pitch Estimation (MPE) continues to be a sought after capability of Music Information Retrieval (MIR) systems, and is critical for many applications and downstream tasks involving pitch, including music transcription. However, existing methods are largely based on supervised learning, and there are significant challenges in collecting annotated data for the task. Recently, self-supervised techniques exploiting intrinsic properties of pitch and harmonic signals have shown promise for both monophonic and polyphonic pitch estimation, but these still remain inferior to supervised methods. In this work, we extend the classic supervised MPE paradigm by incorporating several self-supervised objectives based on pitch-invariant and pitch-equivariant properties. This joint training results in a substantial improvement under closed training conditions, which naturally suggests that applying the same objectives to a broader collection of data will yield further improvements. However, in doing so we uncover a phenomenon whereby our model simultaneously overfits to the supervised data while degenerating on data used for self-supervision only. We demonstrate and investigate this and offer our insights on the underlying problem.


Decoding Memes: Benchmarking Narrative Role Classification across Multilingual and Multimodal Models

arXiv.org Artificial Intelligence

Abstract--This work investigates the challenging task of identifying narrative roles - Hero, Villain, Victim, and Other - in Internet memes, across three diverse test sets spanning English and code-mixed (English-Hindi) languages. Building on an annotated dataset originally skewed toward the'Other' class, we explore a more balanced and linguistically diverse extension, originally introduced as part of the CLEF 2024 shared task. Comprehensive lexical and structural analyses highlight the nuanced, culture-specific, and context-rich language used in real memes, in contrast to synthetically curated hateful content, which exhibits explicit and repetitive lexical markers. T o benchmark the role detection task, we evaluate a wide spectrum of models, including fine-tuned multilingual transformers, sentiment and abuse-aware classifiers, instruction-tuned LLMs, and multimodal vision-language models. Performance is assessed under zero-shot settings using precision, recall, and F1 metrics. W e also explore prompt design strategies to guide multi-modal models and find that hybrid prompts incorporating structured instructions and role definitions offer marginal yet consistent improvements. Our findings underscore the importance of cultural grounding, prompt engineering, and multimodal reasoning in modelling subtle narrative framings in visual-textual content. W arning: This paper contains potentially harmful and offensive content. I. Introduction Social media platforms have become pivotal arenas for rapid information dissemination. However, this openness has also catalysed the proliferation of harmful content - including hate speech, propaganda, and misinformation, often embedded within memes [1], [2]. Memes, with their multimodal structure and cultural resonance, are particularly potent in shaping public opinion and propagating ideologies.


Machine Understanding of Scientific Language

arXiv.org Artificial Intelligence

Scientific information expresses human understanding of nature. This knowledge is largely disseminated in different forms of text, including scientific papers, news articles, and discourse among people on social media. While important for accelerating our pursuit of knowledge, not all scientific text is faithful to the underlying science. As the volume of this text has burgeoned online in recent years, it has become a problem of societal importance to be able to identify the faithfulness of a given piece of scientific text automatically. This thesis is concerned with the cultivation of datasets, methods, and tools for machine understanding of scientific language, in order to analyze and understand science communication at scale. To arrive at this, I present several contributions in three areas of natural language processing and machine learning: automatic fact checking, learning with limited data, and scientific text processing. These contributions include new methods and resources for identifying check-worthy claims, adversarial claim generation, multi-source domain adaptation, learning from crowd-sourced labels, cite-worthiness detection, zero-shot scientific fact checking, detecting exaggerated scientific claims, and modeling degrees of information change in science communication. Critically, I demonstrate how the research outputs of this thesis are useful for effectively learning from limited amounts of scientific text in order to identify misinformative scientific statements and generate new insights into the science communication process


Scaling Self-Supervised Representation Learning for Symbolic Piano Performance

arXiv.org Artificial Intelligence

We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions. After first pretraining on approximately 60,000 hours of music, we use a comparatively smaller, high-quality subset, to finetune models to produce musical continuations, perform symbolic classification tasks, and produce general-purpose contrastive MIDI embeddings by adapting the SimCLR framework to symbolic music. When evaluating piano continuation coherence, our generative model outperforms leading symbolic generation techniques and remains competitive with proprietary audio generation models. On MIR classification benchmarks, frozen representations from our contrastive model achieve state-of-the-art results in linear probe experiments, while direct finetuning demonstrates the generalizability of pretrained representations, often requiring only a few hundred labeled examples to specialize to downstream tasks.