dragon
DRAGON: Distributional Rewards Optimize Diffusion Generative Models
Bai, Yatong, Casebeer, Jonah, Sojoudi, Somayeh, Bryan, Nicholas J.
We present Distributional RewArds for Generative OptimizatioN (DRAGON), a versatile framework for fine-tuning media generation models towards a desired outcome. Compared with traditional reinforcement learning with human feedback (RLHF) or pairwise preference approaches such as direct preference optimization (DPO), DRAGON is more flexible. It can optimize reward functions that evaluate either individual examples or distributions of them, making it compatible with a broad spectrum of instance-wise, instance-to-distribution, and distribution-to-distribution rewards. Leveraging this versatility, we construct novel reward functions by selecting an encoder and a set of reference examples to create an exemplar distribution. When cross-modal encoders such as CLAP are used, the reference may be of a different modality (text versus audio). Then, DRAGON gathers online and on-policy generations, scores them with the reward function to construct a positive demonstration set and a negative set, and leverages the contrast between the two finite sets to approximate distributional reward optimization. For evaluation, we fine-tune an audio-domain text-to-music diffusion model with 20 reward functions, including a custom music aesthetics model, CLAP score, Vendi diversity, and Frechet audio distance (FAD). We further compare instance-wise (per-song) and full-dataset FAD settings while ablating multiple FAD encoders and reference sets. Over all 20 target rewards, DRAGON achieves an 81.45% average win rate. Moreover, reward functions based on exemplar sets enhance generations and are comparable to model-based rewards. With an appropriate exemplar set, DRAGON achieves a 60.95% human-voted music quality win rate without training on human preference annotations. DRAGON is a new approach to designing and optimizing reward functions for improving human-perceived quality. Demos at https://ml-dragon.github.io/web
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.46)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
- (2 more...)
- North America > United States > Florida > Palm Beach County > West Palm Beach (0.04)
- North America > United States > Florida > Palm Beach County > Palm Beach (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Automobiles & Trucks > Manufacturer (0.94)
- Transportation > Ground > Road (0.94)
- Leisure & Entertainment > Sports (0.68)
Grounded Reinforcement Learning: Learning to Win the Game under Human Commands Supplementary Materials
In this section, we describe the details of MiniRTS Environment and human dataset. "spearman" but is retrained by "cavarly". "swordman", "spearman" and "cavalry" all are Figure 2: Building units can produce different army units using resources. "workshop" can produce "archer", "dragon" and "catapult" while other Resource Units: Resource units are stationary and neutral. Resource units cannot be constructed by anyone and are created at the beginning of a game. Building Units: MiniRTS supports 6 different building unit types.
- Leisure & Entertainment > Games (0.95)
- Government > Military > Army (0.38)
A Further Discussion of Evaluation Methodologies
In previous research, there are plenty of arguments about textual backdoor evaluation, including diverse metrics and experiment settings. These valuable discussions motivate us to construct a rigorous benchmark and we highly appreciate their efforts. In this section, we briefly summarize existing opinions and provide a more detailed discussion on this topic. For example, injecting a "cf" trigger Similar to us, Qi et al. [36] measured Few works have talked about validity in textual backdoor learning. For example, consider an attacker who wants to post negative movie reviews and bypass a poisoned sentiment analysis model.
- Media > Film (0.67)
- Information Technology > Security & Privacy (0.48)
DRAGON: Guard LLM Unlearning in Context via Negative Detection and Reasoning
Wang, Yaxuan, Liu, Chris Yuhao, Liu, Quan, Pang, Jinglong, Wei, Wei, Bao, Yujia, Liu, Yang
Unlearning in Large Language Models (LLMs) is crucial for protecting private data and removing harmful knowledge. Most existing approaches rely on fine-tuning to balance unlearning efficiency with general language capabilities. However, these methods typically require training or access to retain data, which is often unavailable in real world scenarios. Although these methods can perform well when both forget and retain data are available, few works have demonstrated equivalent capability in more practical, data-limited scenarios. To overcome these limitations, we propose Detect-Reasoning Augmented GeneratiON (DRAGON), a systematic, reasoning-based framework that utilizes in-context chain-of-thought (CoT) instructions to guard deployed LLMs before inference. Instead of modifying the base model, DRAGON leverages the inherent instruction-following ability of LLMs and introduces a lightweight detection module to identify forget-worthy prompts without any retain data. These are then routed through a dedicated CoT guard model to enforce safe and accurate in-context intervention. To robustly evaluate unlearning performance, we introduce novel metrics for unlearning performance and the continual unlearning setting. Extensive experiments across three representative unlearning tasks validate the effectiveness of DRAGON, demonstrating its strong unlearning capability, scalability, and applicability in practical scenarios.
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Asia > Middle East > Kuwait > Capital Governorate > Kuwait City (0.04)
- Asia > Middle East > Jordan (0.04)
- (4 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government (0.92)
Design and Development of a Modular Bucket Drum Excavator for Lunar ISRU
Giel, Simon, Hurrell, James, Santra, Shreya, Mishra, Ashutosh, Uno, Kentaro, Yoshida, Kazuya
In-Situ Resource Utilization (ISRU) is one of the key technologies for enabling sustainable access to the Moon. The ability to excavate lunar regolith is the first step in making lunar resources accessible and usable. This work presents the development of a bucket drum for the modular robotic system MoonBot, as part of the Japanese Moonshot program. A 3D-printed prototype made of PLA was manufactured to evaluate its efficiency through a series of sandbox tests. The resulting tool weighs 4.8 kg and has a volume of 14.06 L. It is capable of continuous excavation at a rate of 777.54 kg/h with a normalized energy consumption of 0.022 Wh/kg. In batch operation, the excavation rate is 172.02 kg/h with a normalized energy consumption of 0.86 Wh per kilogram of excavated material. The obtained results demonstrate the successful implementation of the concept. A key advantage of the developed tool is its compatibility with the modular MoonBot robotic platform, which enables flexible and efficient mission planning. Further improvements may include the integration of sensors and an autonomous control system to enhance the excavation process.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
- North America > United States > Colorado > Denver County > Denver (0.04)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- (5 more...)
- Energy (0.69)
- Government (0.48)
Grounded Reinforcement Learning: Learning to Win the Game under Human Commands Supplementary Materials
In this section, we describe the details of MiniRTS Environment and human dataset. "spearman" but is retrained by "cavarly". "swordman", "spearman" and "cavalry" all are Figure 2: Building units can produce different army units using resources. "workshop" can produce "archer", "dragon" and "catapult" while other Resource Units: Resource units are stationary and neutral. Resource units cannot be constructed by anyone and are created at the beginning of a game. Building Units: MiniRTS supports 6 different building unit types.
- Leisure & Entertainment > Games (0.95)
- Government > Military > Army (0.38)
A Further Discussion of Evaluation Methodologies
In previous research, there are plenty of arguments about textual backdoor evaluation, including diverse metrics and experiment settings. These valuable discussions motivate us to construct a rigorous benchmark and we highly appreciate their efforts. In this section, we briefly summarize existing opinions and provide a more detailed discussion on this topic. For example, injecting a "cf" trigger Similar to us, Qi et al. [36] measured Few works have talked about validity in textual backdoor learning. For example, consider an attacker who wants to post negative movie reviews and bypass a poisoned sentiment analysis model.
- Media > Film (0.67)
- Information Technology > Security & Privacy (0.48)
'House of the Dragon' Actor's New Horror Game Skewers Hollywood
Abubakar Salim has a lot of beef with Hollywood--and he's getting it off his chest in his latest video game. The actor, known for his roles as Alyn of Hull on House of the Dragon and Father in Raised By Wolves, has been balancing his time between the big screen and gaming, two industries that have been affected by a slew of similar issues: long hours, shrinking jobs, abuse of power, and, more recently, the rapid rise of artificial intelligence use and generative AI. Salim's sophomore game, Dead Take, is a story of Hollywood, ambition, and exploitation, dressed up as a horror game that takes aim at his industry's problems, from corruption to AI use. "Hollywood is pure horror," Salim says. Dead Take is a firm departure from his debut game, Tales of Kenzera: Zau.
Leveraging Open-Source Large Language Models for Clinical Information Extraction in Resource-Constrained Settings
Builtjes, Luc, Bosma, Joeran, Prokop, Mathias, van Ginneken, Bram, Hering, Alessa
Medical reports contain rich clinical information but are often unstructured and written in domain-specific language, posing challenges for information extraction. While proprietary large language models (LLMs) have shown promise in clinical natural language processing, their lack of transparency and data privacy concerns limit their utility in healthcare. This study therefore evaluates nine open-source generative LLMs on the DRAGON benchmark, which includes 28 clinical information extraction tasks in Dutch. We developed \texttt{llm\_extractinator}, a publicly available framework for information extraction using open-source generative LLMs, and used it to assess model performance in a zero-shot setting. Several 14 billion parameter models, Phi-4-14B, Qwen-2.5-14B, and DeepSeek-R1-14B, achieved competitive results, while the bigger Llama-3.3-70B model achieved slightly higher performance at greater computational cost. Translation to English prior to inference consistently degraded performance, highlighting the need of native-language processing. These findings demonstrate that open-source LLMs, when used with our framework, offer effective, scalable, and privacy-conscious solutions for clinical information extraction in low-resource settings.
- North America > United States (0.04)
- Europe > Netherlands > Gelderland > Nijmegen (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.94)
- (2 more...)