Goto

Collaborating Authors

 provider


bf05b8d4361c6be8e250be4b924f0e1d-Paper-Conference.pdf

Neural Information Processing Systems

Finetuning large language models (LLMs) enables user-specific customization but introduces important safety risks: even a few harmful examples can compromise safety alignment. A common mitigation strategy is to update the model more strongly on examples deemed safe, while downweighting or excluding those flagged as unsafe. However, because safety context can shift within a single example, updating the model equally on both harmful and harmless parts of a response is suboptimal -- an atomic treatment we term static safety shaping. In contrast, we propose dynamic safety shaping (DSS), a dynamic shaping framework that uses fine-grained safety signals to reinforce learning from safe segments of a response while suppressing unsafe content. To enable such fine-grained control during finetuning, we introduce a key insight: guardrail models, traditionally used for filtering, can be repurposed to evaluate partial responses, tracking how safety risk evolves throughout the response, segment by segment. This leads to the Safety Trajectory Assessment of Response (STAR), a token-level signal that enables shaping to operate dynamically over the training sequence. Building on this, we present DSS, a DSS method guided by STAR scores that robustly mitigates finetuning risks and delivers substantial safety improvements across diverse threats, datasets, and model families, all without compromising capability on intended tasks. We encourage future safety research to build on dynamic shaping principles for stronger mitigation against evolving finetuning risks.


The Leaderboard Illusion

Neural Information Processing Systems

Measuring progress is fundamental to the advancement of any scientific field. As benchmarks play an increasingly central role, they also become more susceptible to distortion. Chatbot Arena has emerged as the go-to leaderboard for ranking the most capable AI systems. Yet, in this work we identify systematic issues that have skewed the competitive landscape. Specifically, undisclosed private testing practices benefit a handful of providers who are able to test multiple variants before public release and selectively retract scores.


WMCopier: Forging Invisible Watermarks on Arbitrary Images

Neural Information Processing Systems

Invisible Image Watermarking is crucial for ensuring content provenance and accountability in generative AI. While Gen-AI providers are increasingly integrating invisible watermarking systems, the robustness of these schemes against forgery attacks remains poorly characterized. This is critical, as forging traceable watermarks onto illicit content leads to false attribution, potentially harming the reputation and legal standing of Gen-AI service providers who are not responsible for the content. In this work, we propose WMCopier, an effective watermark forgery attack that operates without requiring any prior knowledge of or access to the target watermarking algorithm.


The Leaderboard Illusion

Neural Information Processing Systems

Measuring progress is fundamental to the advancement of any scientific field. As benchmarks play an increasingly central role, they also grow more susceptible to distortion.Chatbot Arena has emerged as the go-to leaderboard for ranking the most capable AI systems. Yet, in this work we identify systematic issues that have resulted in a distorted playing field. We find that undisclosed private testing practices benefit a handful of providers who are able to test multiple variants before public release and retract scores if desired. We establish that the ability of these providers to choose the best score leads to biased Arena scores due to selective disclosure of performance results.


Health Leaders Talk How AI Can Help Patients Be More Proactive

TIME - Tech

Pillay is an editorial fellow at TIME. America's healthcare system is notoriously reactive. Could AI shift it from a system that treats illness to one that prevents it? The question framed a panel discussion at the inaugural TIME100 AI Leadership Forum on May 27, which featured Dr. Omar Lateef, the president and CEO of Rush University System for Health; Arianna Huffington, the founder and CEO of Thrive Global; and Neil Lindsay, senior vice president of Amazon Health Services (Amazon One Medical, an Amazon health service, was an event sponsor). The conversation was moderated by TIME senior health correspondent Alice Park.


Report on foundation model impacts released

AIHub

Partnership on AI has published a progress report on post-deployment governance practices pertaining to foundation models. The document, entitled " 2026 Transparency Report on Foundation Model Impacts ", measures the progress of 13 foundation model providers* in publicly documenting the impacts of their foundation models. In carrying out their analysis, authors Jacob Pratt and Albert Tanjaya reviewed more than 150 papers, articles, websites, and reports. For assessment, these four practices were broken down into 19 processes, or activities, that support how foundation model providers adopt practices. Although several leading organizations are defining what information to share and how, the rest are slow in adopting information-sharing practices.


Telehealth Abortion Is Still Possible Without Mifepristone

WIRED

Courts may restrict access to the popular abortion medication mifepristone in the United States. Telehealth providers have backup plans in place. Abortion provider Carafem's phones were ringing nonstop over the weekend after a US federal appeals court reinstated a nationwide requirement that the drug mifepristone, one of two pills used for a medication abortion, must be obtained in person. The decision, handed down on Friday, left patients unsure if they could gain access to their treatment through telehealth. "People are afraid, and they're angry," says Carafem's chief operations officer, Melissa Grant. "I had people contact us saying, .


Flat-rate AI plans are broken. Blame AI agents

PCWorld

PCWorld reports that major AI providers including Anthropic, Google, OpenAI, and GitHub are adjusting flat-rate subscription plans due to increased demand from agentic AI tools. Advanced AI agents like Google Antigravity and GitHub Copilot consume significantly more computational resources than traditional AI interactions, causing users to hit usage limits more frequently. The shift toward agentic workflows is forcing providers to introduce higher-tier plans, halt new sign-ups, and transition to usage-based models, fundamentally changing AI service accessibility. Remember when a $20-a-month "Pro" or "Plus" AI plan served up more AI access than you could possibly use? Ah, those were the days.


Amazon Health AI brings a doctor to your pocket

FOX News

Amazon Health AI is a new digital health assistant that answers medical questions, explains lab results and connects users with Amazon One Medical providers for care.