Oceania
Multi-Aspect Knowledge-Enhanced Medical Vision-Language Pretraining with Multi-Agent Data Generation
Li, Xieji, Yan, Siyuan, Liu, Yingsheng, Soyer, H. Peter, Janda, Monika, Mar, Victoria, Ge, Zongyuan
Vision-language pretraining (VLP) has emerged as a powerful paradigm in medical image analysis, enabling representation learning from large-scale image-text pairs without relying on expensive manual annotations. However, existing methods often struggle with the noise inherent in web-collected data and the complexity of unstructured long medical texts. To address these challenges, we propose a novel VLP framework integrating a Multi-Agent data GENeration (MAGEN) system and Ontology-based Multi-Aspect Knowledge-Enhanced (O-MAKE) pretraining. First, MAGEN enhances data quality by synthesizing knowledge-enriched descriptions via a foundation model-assisted captioning and retrieval-based verification pipeline. Second, O-MAKE addresses the difficulty of learning from long, unstructured texts by decomposing them into distinct knowledge aspects. This facilitates fine-grained alignment at both global and patch levels, while explicitly modeling medical concept relationships through ontology-guided mechanisms. We validate our framework in the field of dermatology, where comprehensive experiments demonstrate the effectiveness of each component. Our approach achieves state-of-the-art zero-shot performance on disease classification and cross-modal retrieval tasks across eight datasets. Our code and the augmented dataset Derm1M-AgentAug, comprising over 400k skin-image-text pairs, will be released at https://github.com/SiyuanYan1/Derm1M.
Multimodal Reinforcement Learning with Agentic Verifier for AI Agents
Tan, Reuben, Peng, Baolin, Yang, Zhengyuan, Cheng, Hao, Mees, Oier, Zhao, Theodore, Tupini, Andrea, Meijier, Isar, Wu, Qianhui, Yang, Yuncong, Liden, Lars, Gu, Yu, Zhang, Sheng, Liu, Xiaodong, Wang, Lijuan, Pollefeys, Marc, Lee, Yong Jae, Gao, Jianfeng
Agentic reasoning models trained with multimodal reinforcement learning (MMRL) have become increasingly capable, yet they are almost universally optimized using sparse, outcome-based rewards computed based on the final answers. Richer rewards computed from the reasoning tokens can improve learning significantly by providing more fine-grained guidance. However, it is challenging to compute more informative rewards in MMRL beyond those based on outcomes since different samples may require different scoring functions and teacher models may provide noisy reward signals too. In this paper, we introduce the Argos (Agentic Reward for Grounded & Objective Scoring), a principled reward agent to train multimodal reasoning models for agentic tasks. For each sample, Argos selects from a pool of teacher-model derived and rule-based scoring functions to simultaneously evaluate: (i) final response accuracy, (ii) spatiotemporal localization of referred entities and actions, and (iii) the quality of the reasoning process. We find that by leveraging our agentic verifier across both SFT data curation and RL training, our model achieves state-of-the-art results across multiple agentic tasks such as spatial reasoning, visual hallucination as well as robotics and embodied AI benchmarks. Critically, we demonstrate that just relying on SFT post-training on highly curated reasoning data is insufficient, as agents invariably collapse to ungrounded solutions during RL without our online verification. We also show that our agentic verifier can help to reduce reward-hacking in MMRL. Finally, we also provide a theoretical justification for the effectiveness of Argos through the concept of pareto-optimality.
LLM-Generated Ads: From Personalization Parity to Persuasion Superiority
Meguellati, Elyas, Civelli, Stefano, Han, Lei, Bernstein, Abraham, Sadiq, Shazia, Demartini, Gianluca
As large language models (LLMs) become increasingly capable of generating persuasive content, understanding their effectiveness across different advertising strategies becomes critical. This paper presents a two-part investigation examining LLM-generated advertising through complementary lenses: (1) personality-based and (2) psychological persuasion principles. In our first study (n=400), we tested whether LLMs could generate personalized advertisements tailored to specific personality traits (openness and neuroticism) and how their performance compared to human experts. Results showed that LLM-generated ads achieved statistical parity with human-written ads (51.1% vs. 48.9%, p > 0.05), with no significant performance differences for matched personalities. Building on these insights, our second study (n=800) shifted focus from individual personalization to universal persuasion, testing LLM performance across four foundational psychological principles: authority, consensus, cognition, and scarcity. AI-generated ads significantly outperformed human-created content, achieving a 59.1% preference rate (vs. 40.9%, p < 0.001), with the strongest performance in authority (63.0%) and consensus (62.5%) appeals. Qualitative analysis revealed AI's advantage stems from crafting more sophisticated, aspirational messages and achieving superior visual-narrative coherence. Critically, this quality advantage proved robust: even after applying a 21.2 percentage point detection penalty when participants correctly identified AI-origin, AI ads still outperformed human ads, and 29.4% of participants chose AI content despite knowing its origin. These findings demonstrate LLMs' evolution from parity in personalization to superiority in persuasive storytelling, with significant implications for advertising practice given LLMs' near-zero marginal cost and time requirements compared to human experts.
The BEAT-CF Causal Model: A model for guiding the design of trials and observational analyses of cystic fibrosis exacerbations
Mascaro, Steven, Woodberry, Owen, McLeod, Charlie, Messer, Mitch, Selvadurai, Hiran, Wu, Yue, Schultz, Andre, Snelling, Thomas L
Loss of lung function in cystic fibrosis (CF) occurs progressively, punctuated by acute pulmonary exacerbations (PEx) in which abrupt declines in lung function are not fully recovered. A key component of CF management over the past half century has been the treatment of PEx to slow lung function decline. This has been credited with improvements in survival for people with CF (PwCF), but there is no consensus on the optimal approach to PEx management. BEAT-CF (Bayesian evidence-adaptive treatment of CF) was established to build an evidence-informed knowledge base for CF management. The BEAT-CF causal model is a directed acyclic graph (DAG) and Bayesian network (BN) for PEx that aims to inform the design and analysis of clinical trials comparing the effectiveness of alternative approaches to PEx management. The causal model describes relationships between background risk factors, treatments, and pathogen colonisation of the airways that affect the outcome of an individual PEx episode. The key factors, outcomes, and causal relationships were elicited from CF clinical experts and together represent current expert understanding of the pathophysiology of a PEx episode, guiding the design of data collection and studies and enabling causal inference. Here, we present the DAG that documents this understanding, along with the processes used in its development, providing transparency around our trial design and study processes, as well as a reusable framework for others.
Banquet, Royal Family and Starmer on first day of German state visit
The Royal Family hosted the first German state visit to the UK in 27 years - with a state banquet and ceremonial events in Windsor. The Prince and Princess of Wales met Frank-Walter Steinmeier on the tarmac at Heathrow, before King Charles hosted him in a glittering, Christmassy state banquet at Windsor Castle. In a speech delivered in both English and German, the King welcomed the President and his wife, as well as the 150 other guests which included Prime Minister Sir Keir Starmer. In response, President Steinmeier said the King's first visit abroad as monarch to Germany in 2023 was a special symbol of the German-English friendship. The BBC's Russia Editor shares his analysis after five hours of peace talks between the Russians and the US.
The Age-Gated Internet Is Sweeping the US. Activists Are Fighting Back
The Age-Gated Internet Is Sweeping the US. Half of the country now requires age verification to watch porn or access "harmful" content. Digital rights advocates are pushing back against legislation they say will make the internet less safe. To prove you're an adult, you may have to upload your ID or submit to an age-verifying face scan. Members of Congress considered 19 online safety bills Tuesday that may soon have a major impact on the future of the internet as age-verification laws have spread to half of the US and around the world .
Nike, Superdry and Lacoste ads banned over misleading green claims
Adverts for Nike, Superdry and Lacoste have been banned for making misleading claims about their green credentials. The UK's advertising watchdog challenged the brands over the use of the word sustainable in paid-for Google ads which were not backed up by evidence of their sustainability. The Advertising Standards Authority (ASA) identified three adverts from the retailers promising customers sustainable materials, sustainable style and sustainable clothing. The UK's advertising code states that the basis of claims about environmental sustainability must be clear and supported by a high level of substantiation. In each case, it asked the companies for evidence to back up the claims about the sustainability of the products.
DialogGuard: Multi-Agent Psychosocial Safety Evaluation of Sensitive LLM Responses
Large language models (LLMs) now mediate many web-based mental-health, crisis, and other emotionally sensitive services, yet their psychosocial safety in these settings remains poorly understood and weakly evaluated. We present DialogGuard, a multi-agent framework for assessing psychosocial risks in LLM-generated responses along five high-severity dimensions: privacy violations, discriminatory behaviour, mental manipulation, psychological harm, and insulting behaviour. DialogGuard can be applied to diverse generative models through four LLM-as-a-judge pipelines, including single-agent scoring, dual-agent correction, multi-agent debate, and stochastic majority voting, grounded in a shared three-level rubric usable by both human annotators and LLM judges. Using PKU-SafeRLHF with human safety annotations, we show that multi-agent mechanisms detect psychosocial risks more accurately than non-LLM baselines and single-agent judging; dual-agent correction and majority voting provide the best trade-off between accuracy, alignment with human ratings, and robustness, while debate attains higher recall but over-flags borderline cases. We release Dialog-Guard as open-source software with a web interface that provides per-dimension risk scores and explainable natural-language rationales. A formative study with 12 practitioners illustrates how it supports prompt design, auditing, and supervision of web-facing applications for vulnerable users.
CAIRNS: Balancing Readability and Scientific Accuracy in Climate Adaptation Question Answering
Kong, Liangji, Joshi, Aditya, Karimi, Sarvnaz
Climate adaptation strategies are proposed in response to climate change. They are practised in agriculture to sustain food production. These strategies can be found in unstructured data (for example, scientific literature from the Elsevier website) or structured (heterogeneous climate data via government APIs). We present Climate Adaptation question-answering with Improved Readability and Noted Sources (CAIRNS), a framework that enables experts -- farmer advisors -- to obtain credible preliminary answers from complex evidence sources from the web. It enhances readability and citation reliability through a structured ScholarGuide prompt and achieves robust evaluation via a consistency-weighted hybrid evaluator that leverages inter-model agreement with experts. Together, these components enable readable, verifiable, and domain-grounded question-answering without fine-tuning or reinforcement learning. Using a previously reported dataset of expert-curated question-answers, we show that CAIRNS outperforms the baselines on most of the metrics. Our thorough ablation study confirms the results on all metrics. To validate our LLM-based evaluation, we also report an analysis of correlations against human judgment.
Haaland joins Premier League 100 club - who else is in it?
Haaland joins Premier League 100 club - who else is in it? Erling Haaland has become the 35th footballer to join the Premier League's '100 club' by reaching a century of goals in the competition with the opener in Manchester City's fixture at Fulham. The Norwegian striker has also become the fastest player to reach the milestone, with his 100th goal coming in his 111th game - beating Alan Shearer's previous record, set in 1995, by 13 appearances. Haaland's total goals-per-game rate in the Premier League is now 0.90, and he may eventually challenge for Shearer's overall goals record of 260 in the division, which launched in 1992-93 when it broke away from the English Football League. Haaland's City contract - a record nine-and-a-half-year deal announced in January - runs through to the end of the 2033-34 season.