Media
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Liu, Zhenyu, Li, Yunxin, Zhang, Xuanyu, Teng, Qixun, Jiang, Shenyuan, Chen, Xinyu, Shi, Haoyuan, Li, Jinchao, Wang, Qi, Chen, Haolan, Meng, Fanbo, Zhao, Mingjun, Xu, Yu, He, Yancheng, Hu, Baotian, Zhang, Min
Recent advances in unified multimodal models indicate a clear trend towards comprehensive content generation. However, the auditory domain remains a significant challenge, with music and speech often developed in isolation, hindering progress towards universal audio synthesis. This separation stems from inherent task conflicts and severe data imbalances, which impede the development of a truly unified audio generation model. To address this challenge, we propose UniMoE-Audio, a unified speech and music generation model within a novel Dynamic-Capacity Mixture-of-Experts (MoE) framework. Architecturally, UniMoE-Audio introduces a Top-P routing strategy for dynamic expert number allocation, and a hybrid expert design comprising routed experts for domain-specific knowledge, shared experts for domain-agnostic features, and null experts for adaptive computation skipping. To tackle data imbalance, we introduce a three-stage training curriculum: 1) Independent Specialist Training leverages original datasets to instill domain-specific knowledge into each "proto-expert" without interference; 2) MoE Integration and Warmup incorporates these specialists into the UniMoE-Audio architecture, warming up the gate module and shared expert using a subset of balanced dataset; and 3) Synergistic Joint Training trains the entire model end-to-end on the fully balanced dataset, fostering enhanced cross-domain synergy. Extensive experiments show that UniMoE-Audio not only achieves state-of-the-art performance on major speech and music generation benchmarks, but also demonstrates superior synergistic learning, mitigating the performance degradation typically seen in naive joint training. Our findings highlight the substantial potential of specialized MoE architecture and curated training strategies in advancing the field of universal audio generation. Homepage: https://mukioxun.github.io/Uni-MoE-site/home.html
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
He, Yufei, Liu, Juncheng, Liu, Yue, Li, Yibo, Cao, Tri, Hu, Zhiyuan, Xu, Xinxing, Hooi, Bryan
A fundamental limitation of current AI agents is their inability to learn complex skills on the fly at test time, often behaving like "clever but clueless interns" in novel environments. This severely limits their practical utility. To systematically measure and drive progress on this challenge, we first introduce the Jericho Test-Time Learning (J-TTL) benchmark. J-TTL is a new evaluation setup where an agent must play the same game for several consecutive episodes, attempting to improve its performance from one episode to the next. On J-TTL, we find that existing adaptation methods like reflection, memory, or reinforcement learning struggle. To address the challenges posed by our benchmark, we present EvoTest, an evolutionary test-time learning framework that improves an agent without any fine-tuning or gradients-by evolving the entire agentic system after every episode. EvoTest has two roles: the Actor Agent, which plays the game, and the Evolver Agent, which analyzes the episode transcript to propose a revised configuration for the next run. This configuration rewrites the prompt, updates memory by logging effective state-action choices, tunes hyperparameters, and learns the tool-use routines. On our J-TTL benchmark, EvoTest consistently increases performance, outperforming not only reflection and memory-only baselines but also more complex online fine-tuning methods. Notably, our method is the only one capable of winning two games (Detective and Library), while all baselines fail to win any.
SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs
Ren, Juan, Dras, Mark, Naseem, Usman
Large Vision-Language Models (LVLMs) unlock powerful multimodal reasoning but also expand the attack surface, particularly through adversarial inputs that conceal harmful goals in benign prompts. We propose SHIELD, a lightweight, model-agnostic preprocessing framework that couples fine-grained safety classification with category-specific guidance and explicit actions (Block, Reframe, Forward). Unlike binary moderators, SHIELD composes tailored safety prompts that enforce nuanced refusals or safe redirection without retraining. Across five benchmarks and five representative LVLMs, SHIELD consistently lowers jailbreak and non-following rates while preserving utility. Our method is plug-and-play, incurs negligible overhead, and is easily extendable to new attack types -- serving as a practical safety patch for both weakly and strongly aligned LVLMs.
SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG
Si, Xiaonan, Zhu, Meilin, Qin, Simeng, Yu, Lijia, Zhang, Lijun, Liu, Shuaitong, Li, Xinfeng, Duan, Ranjie, Liu, Yang, Jia, Xiaojun
Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) with external knowledge but are vulnerable to corpus poisoning and contamination attacks, which can compromise output integrity. Existing defenses often apply aggressive filtering, leading to unnecessary loss of valuable information and reduced reliability in generation. To address this problem, we propose a two-stage semantic filtering and conflict-free framework for trustworthy RAG. In the first stage, we perform a joint filter with semantic and cluster-based filtering which is guided by the Entity-intent-relation extractor (EIRE). EIRE extracts entities, latent objectives, and entity relations from both the user query and filtered documents, scores their semantic relevance, and selectively adds valuable documents into the clean retrieval database. In the second stage, we proposed an EIRE-guided conflict-aware filtering module, which analyzes semantic consistency between the query, candidate answers, and retrieved knowledge before final answer generation, filtering out internal and external contradictions that could mislead the model. Through this two-stage process, SeCon-RAG effectively preserves useful knowledge while mitigating conflict contamination, achieving significant improvements in both generation robustness and output trustworthiness. Extensive experiments across various LLMs and datasets demonstrate that the proposed SeCon-RAG markedly outperforms state-of-the-art defense methods.
AutoPR: Let's Automate Your Academic Promotion!
Chen, Qiguang, Yan, Zheng, Yang, Mingda, Qin, Libo, Yuan, Yixin, Li, Hanjing, Liu, Jinhao, Ji, Yiyan, Peng, Dengyun, Guan, Jiannan, Hu, Mengkang, Du, Yantao, Che, Wanxiang
As the volume of peer-reviewed research surges, scholars increasingly rely on social platforms for discovery, while authors invest considerable effort in promoting their work to ensure visibility and citations. To streamline this process and reduce the reliance on human effort, we introduce Automatic Promotion (AutoPR), a novel task that transforms research papers into accurate, engaging, and timely public content. To enable rigorous evaluation, we release PRBench, a multimodal benchmark that links 512 peer-reviewed articles to high-quality promotional posts, assessing systems along three axes: Fidelity (accuracy and tone), Engagement (audience targeting and appeal), and Alignment (timing and channel optimization). We also introduce PRAgent, a multi-agent framework that automates AutoPR in three stages: content extraction with multimodal preparation, collaborative synthesis for polished outputs, and platform-specific adaptation to optimize norms, tone, and tagging for maximum reach. When compared to direct LLM pipelines on PRBench, PRAgent demonstrates substantial improvements, including a 604% increase in total watch time, a 438% rise in likes, and at least a 2.9x boost in overall engagement. Ablation studies show that platform modeling and targeted promotion contribute the most to these gains. Our results position AutoPR as a tractable, measurable research problem and provide a roadmap for scalable, impactful automated scholarly communication.
Improving Zero-shot Sentence Decontextualisation with Content Selection and Planning
Deng, Zhenyun, Chen, Yulong, Vlachos, Andreas
Extracting individual sentences from a document as evidence or reasoning steps is commonly done in many NLP tasks. However, extracted sentences often lack context necessary to make them understood, e.g., coreference and background information. To this end, we propose a content selection and planning framework for zero-shot decontextualisation, which determines what content should be mentioned and in what order for a sentence to be understood out of context. Specifically, given a potentially ambiguous sentence and its context, we first segment it into basic semantically-independent units. We then identify potentially ambiguous units from the given sentence, and extract relevant units from the context based on their discourse relations. Finally, we generate a content plan to rewrite the sentence by enriching each ambiguous unit with its relevant units. Experimental results demonstrate that our approach is competitive for sentence decontextualisation, producing sentences that exhibit better semantic integrity and discourse coherence, outperforming existing methods.
SeLeRoSa: Sentence-Level Romanian Satire Detection Dataset
Smădu, Răzvan-Alexandru, Iuga, Andreea, Cercel, Dumitru-Clementin, Pop, Florin
Satire, irony, and sarcasm are techniques typically used to express humor and critique, rather than deceive; however, they can occasionally be mistaken for factual reporting, akin to fake news. These techniques can be applied at a more granular level, allowing satirical information to be incorporated into news articles. In this paper, we introduce the first sentence-level dataset for Romanian satire detection for news articles, called SeLeRoSa. The dataset comprises 13,873 manually annotated sentences spanning various domains, including social issues, IT, science, and movies. With the rise and recent progress of large language models (LLMs) in the natural language processing literature, LLMs have demonstrated enhanced capabilities to tackle various tasks in zero-shot settings. We evaluate multiple baseline models based on LLMs in both zero-shot and fine-tuning settings, as well as baseline transformer-based models. Our findings reveal the current limitations of these models in the sentence-level satire detection task, paving the way for new research directions.
Earth braces for impact as rare 'four-way' solar storm set to strike in just HOURS
Disney superfan, 31, vanishes from her Midwest home months after announcing pregnancy... then horrific discovery is made at Walt Disney World Pete Hegseth's jet makes emergency landing in Britain after high-stakes NATO summit on Russia-Ukraine war Doctor's husband'was watching X-rated videos in his house while daughter, two, died in roasting car outside' Bella Hadid's health battle takes dark turn: Loved ones reveal hellish new details about'missing' model... as ominous texts emerge Charlie Kirk suspect invokes Bryan Kohberger as he makes clothing demand to seem'more human' Trump says he'll go to the Supreme Court to watch tariff arguments Trump hails'beautiful black women' strutting Chicago in MAGA hats America's saddest lost soul can no longer SPEAK and spends days hitting herself'after years of unspeakable abuse by gangs of men' Virginia Giuffre calls Prince Andrew'entitled' and claims duke saw having sex with her as his'birthright' in autobiography released after her death'You will DIE if you do not remove your breasts', doctors screamed at me. I refused and tried a new experimental therapy instead... now I'm cancer-free Warning over'life-threatening' storm brewing in Atlantic that could hit US Will Trump's Gaza peace deal fail? Policy expert MARK DUBOWITZ breaks down all the forces at play... and how the president can actually pull this off Stephen A Smith eviscerates Democrat sweetheart Jasmine Crocket as he slams her use of'street' language: 'Republicans want her on TV' Astonishing interactive map lays bare where MILLIONS of homes will be submerged by water within a few years... are YOU at risk? The View's Joy Behar reveals the TRUTH behind her ageless appearance aged 83 Earth braces for impact as rare'four-way' solar storm set to strike in just HOURS A parade of four powerful bursts of solar energy is hurtling toward Earth, raising alarms about potential disruptions to technology and communications. The National Oceanic and Atmospheric Administration (NOAA) has issued a moderate (G2) geomagnetic storm watch for Thursday, warning that power grids, radio signals and GPS navigation systems could be affected .
The Man Behind Two of the Greatest Albums of the Century Is Gone
The singer leaves behind two of the greatest albums of the century--and generations of artists still struggling to keep up. Great artists who are the opposite of prolific are always a thorny subject. Many of our most romantic ideas about creativity tend to view "genius" as a kind of vessel state, from which beauty and inspiration simply flow forth, effortlessly and boundlessly: It's deflating to be confronted with the reality that this isn't always how it works. And, of course, when such artists come to be the subjects of intense devotion and scrutiny, it often provokes a demand for more and more, faster and faster, which usually has the counterproductive effect of further pressurizing an already fraught creative process. And yet these artists are distinctively precious in their own way, necessary reminders (particularly in our age of pathological, parasocial standom) that even stars don't exist solely as objects for our consumption, that sharing a world with people who provide us with beautiful things is a privilege to be cherished and cared for, rather than an entitlement to be hoarded or otherwise fetishized.
Weather Channel gets jazzy, retro makeover from dedicated online fans
The free service offers retro graphics, smooth tunes, and up-to-date forecasts. Breakthroughs, discoveries, and DIY tips sent every weekday. The Weather Channel's accuracy has undoubtedly improved since the early days of cable TV, but the same can't necessarily be said about The Weather Channel's . That's not meant as an insult to the company's art design team--but there is simply no real match to that distinctly minimalist, retro-rudimentary look of forecasts from the 1980s, 90s, and early 2000s. Need further proof that there are those out there who yearn to return to the days of meteorology reports coupled with smooth jazz?