Media
NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms
Wang, Yashan, Wu, Shangda, Hu, Jianhuai, Du, Xingjian, Peng, Yueqi, Huang, Yongxin, Fan, Shuai, Li, Xiaobing, Yu, Feng, Sun, Maosong
We introduce NotaGen, a symbolic music generation model aiming to explore the potential of producing high-quality classical sheet music. Inspired by the success of Large Language Models (LLMs), NotaGen adopts pre-training, fine-tuning, and reinforcement learning paradigms (henceforth referred to as the LLM training paradigms). It is pre-trained on 1.6M pieces of music in ABC notation, and then fine-tuned on approximately 9K high-quality classical compositions conditioned on "period-composer-instrumentation" prompts. For reinforcement learning, we propose the CLaMP-DPO method, which further enhances generation quality and controllability without requiring human annotations or predefined rewards. Our experiments demonstrate the efficacy of CLaMP-DPO in symbolic music generation models with different architectures and encoding schemes. Furthermore, subjective A/B tests show that NotaGen outperforms baseline models against human compositions, greatly advancing musical aesthetics in symbolic music generation.
Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models
Yoo, Suho, Ok, Hyunjong, Lee, Jaeho
Language models pretrained on text-only corpora often struggle with tasks that require auditory commonsense knowledge. Previous work addresses this problem by augmenting the language model to retrieve knowledge from external audio databases. This approach has several limitations, such as the potential lack of relevant audio in databases and the high costs associated with constructing and querying the databases. To address these issues, we propose Imagine to Hear, a novel approach that dynamically generates auditory knowledge using generative models. Our framework detects multiple audio-related textual spans from the given prompt and generates corresponding auditory knowledge. We develop several mechanisms to efficiently process multiple auditory knowledge, including a CLAP-based rejection sampler and a language-audio fusion module. Our experiments show that our method achieves state-of-the-art performance on AuditoryBench without relying on external databases, highlighting the effectiveness of our generation-based approach.
Robot wars: Nvidia unveils stunning Wall-E-style robot sparking Boston Dynamics to hit back with cartwheeling humanoid
For any homeowner, having a helpful robot companion around the home is the stuff of sci-fi-worthy dreams. But American tech firm Nvidia is now among the companies keen to make this a reality. In California on Tuesday, the chip giant unveiled Blue, a cute advanced AI-powered robot with two legs, just 3 feet tall. Footage shows Blue โ which looks like the robot from the Pixar classic Wall-E โ walk onto the stage as it's introduced by Nvidia CEO Jensen Huang. 'Tell me that wasn't amazing,' Huang says to the audience, as Blue waddles up to him with a similar gait to a duck.
Trump urged by Ben Stiller, Paul McCartney and hundreds of stars to protect AI copyright rules
The'America's Got Talent' judge told Fox News Digital why he doesn't like AI technology in songwriting. "We firmly believe that America's global AI leadership must not come at the expense of our essential creative industries," the letter, addressed to Trump's Office of Science and Technology Policy and shared by Deadline and Variety, began. "America's arts and entertainment industry supports over 2.3M American jobs with over 229Bn in wages annually, while providing the foundation for American democratic influence and soft power abroad. The letter was submitted as part of comments on the Trump administration's U.S. AI Action Plan. WHAT IS ARTIFICIAL INTELLIGENCE (AI)? SIMON COWELL WARNS AI'SHOULDN'T BE ABLE TO STEAL' HUMAN TALENT "Access to America's creative catalog of films, writing, video content, and music is not a matter of national security.
Entity-aware Cross-lingual Claim Detection for Automated Fact-checking
Panchendrarajan, Rrubaa, Zubiaga, Arkaitz
Identifying claims requiring verification is a critical task in automated fact-checking, especially given the proliferation of misinformation on social media platforms. Despite significant progress in the task, there remain open challenges such as dealing with multilingual and multimodal data prevalent in online discourse. Addressing the multilingual challenge, recent efforts have focused on fine-tuning pre-trained multilingual language models. While these models can handle multiple languages, their ability to effectively transfer cross-lingual knowledge for detecting claims spreading on social media remains under-explored. In this paper, we introduce EX-Claim, an entity-aware cross-lingual claim detection model that generalizes well to handle claims written in any language. The model leverages entity information derived from named entity recognition and entity linking techniques to improve the language-level performance of both seen and unseen languages during training. Extensive experiments conducted on three datasets from different social media platforms demonstrate that our proposed model significantly outperforms the baselines, across 27 languages, and achieves the highest rate of knowledge transfer, even with limited training data.
Echoes of Power: Investigating Geopolitical Bias in US and China Large Language Models
Pacheco, Andre G. C., Cavalini, Athus, Comarela, Giovanni
In particular, the ChatGPT model (GPT-3.5 and GPT-4) [1] has demonstrated its potential to generate human-like conversational abilities, enabling it to engage in meaningful dialogues, answer questions, and generate text across a wide range of topics, including science, entertainment, and politics [13, 14, 20]. The ability of these models to generate coherent and contextually relevant text has made them a powerful tool for content creation and enabling new ways of human-machine interactions. Despite their potential benefits, the widespread adoption of LLMs has raised concerns about their potential misuse, particularly in generating disinformation [16, 23, 25], fake news [11, 27], and hate speech [10, 22]. Beyond these widely recognized concerns, another critical issue has gained increasing attention in recent months: the potential of these models to manipulate public opinion, both due to the inherent biases embedded in their training process and the biases deliberately introduced or reinforced by their developers or maintainers. The most modern LLMs designed to interact with humans are generally trained using at least two phases. First, they are trained on large-scale text corpora, which inevitably incorporate the ideological, cultural, and political perspectives present in the source.
Deceptive Humor: A Synthetic Multilingual Benchmark Dataset for Bridging Fabricated Claims with Humorous Content
Kasu, Sai Kartheek Reddy, Biradar, Shankar, Saumya, Sunil
This paper presents the Deceptive Humor Dataset (DHD), a novel resource for studying humor derived from fabricated claims and misinformation. In an era of rampant misinformation, understanding how humor intertwines with deception is essential. DHD consists of humor-infused comments generated from false narratives, incorporating fabricated claims and manipulated information using the ChatGPT-4o model. Each instance is labeled with a Satire Level, ranging from 1 for subtle satire to 3 for high-level satire and classified into five distinct Humor Categories: Dark Humor, Irony, Social Commentary, Wordplay, and Absurdity. The dataset spans multiple languages including English, Telugu, Hindi, Kannada, Tamil, and their code-mixed variants (Te-En, Hi-En, Ka-En, Ta-En), making it a valuable multilingual benchmark. By introducing DHD, we establish a structured foundation for analyzing humor in deceptive contexts, paving the way for a new research direction that explores how humor not only interacts with misinformation but also influences its perception and spread. We establish strong baselines for the proposed dataset, providing a foundation for future research to benchmark and advance deceptive humor detection models.
DIPLI: Deep Image Prior Lucky Imaging for Blind Astronomical Image Restoration
Singh, Suraj, Batsheva, Anastasia, Rogov, Oleg Y., Bouridane, Ahmed
Contemporary image restoration and super-resolution techniques effectively harness deep neural networks, markedly outperforming traditional methods. However, astrophotography presents unique challenges for deep learning due to limited training data. This work explores hybrid strategies, such as the Deep Image Prior (DIP) model, which facilitates blind training but is susceptible to overfitting, artifact generation, and instability when handling noisy images. We propose enhancements to the DIP model's baseline performance through several advanced techniques. First, we refine the model to process multiple frames concurrently, employing the Back Projection method and the TVNet model. Next, we adopt a Markov approach incorporating Monte Carlo estimation, Langevin dynamics, and a variational input technique to achieve unbiased estimates with minimal variance and counteract overfitting effectively. Collectively, these modifications reduce the likelihood of noise learning and mitigate loss function fluctuations during training, enhancing result stability. We validated our algorithm across multiple image sets of astronomical and celestial objects, achieving performance that not only mitigates limitations of Lucky Imaging, a classical computer vision technique that remains a standard in astronomical image reconstruction but surpasses the original DIP model, state of the art transformer- and diffusion-based models, underscoring the significance of our improvements.
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Ren, Richard, Agarwal, Arunim, Mazeika, Mantas, Menghini, Cristina, Vacareanu, Robert, Kenstler, Brad, Yang, Mick, Barrass, Isabelle, Gatti, Alice, Yin, Xuwang, Trevino, Eduardo, Geralnik, Matias, Khoja, Adam, Lee, Dean, Yue, Summer, Hendrycks, Dan
As large language models (LLMs) become more capable and agentic, the requirement for trust in their outputs grows significantly, yet at the same time concerns have been mounting that models may learn to lie in pursuit of their goals. To address these concerns, a body of work has emerged around the notion of "honesty" in LLMs, along with interventions aimed at mitigating deceptive behaviors. However, evaluations of honesty are currently highly limited, with no benchmark combining large scale and applicability to all models. Moreover, many benchmarks claiming to measure honesty in fact simply measure accuracy--the correctness of a model's beliefs--in disguise. In this work, we introduce a large-scale human-collected dataset for measuring honesty directly, allowing us to disentangle accuracy from honesty for the first time. Across a diverse set of LLMs, we find that while larger models obtain higher accuracy on our benchmark, they do not become more honest. Surprisingly, while most frontier LLMs obtain high scores on truthfulness benchmarks, we find a substantial propensity in frontier LLMs to lie when pressured to do so, resulting in low honesty scores on our benchmark. We find that simple methods, such as representation engineering interventions, can improve honesty. These results underscore the growing need for robust evaluations and effective interventions to ensure LLMs remain trustworthy.