Law
Magical: Medical Lay Language Generation via Semantic Invariance and Layperson-tailored Adaptation
Liao, Weibin, Wang, Tianlong, Zhu, Yinghao, Wang, Yasha, Gao, Junyi, Ma, Liantao
Medical Lay Language Generation (MLLG) plays a vital role in improving the accessibility of complex scientific content for broader audiences. Recent literature to MLLG commonly employ parameter-efficient fine-tuning methods such as Low-Rank Adaptation (LoRA) to fine-tuning large language models (LLMs) using paired expert-lay language datasets. However, LoRA struggles with the challenges posed by multi-source heterogeneous MLLG datasets. Specifically, through a series of exploratory experiments, we reveal that standard LoRA fail to meet the requirement for semantic fidelity and diverse lay-style generation in MLLG task. To address these limitations, we propose Magical, an asymmetric LoRA architecture tailored for MLLG under heterogeneous data scenarios. Magical employs a shared matrix $A$ for abstractive summarization, along with multiple isolated matrices $B$ for diverse lay-style generation. To preserve semantic fidelity during the lay language generation process, Magical introduces a Semantic Invariance Constraint to mitigate semantic subspace shifts on matrix $A$. Furthermore, to better adapt to diverse lay-style generation, Magical incorporates the Recommendation-guided Switch, an externally interface to prompt the LLM to switch between different matrices $B$. Experimental results on three real-world lay language generation datasets demonstrate that Magical consistently outperforms prompt-based methods, vanilla LoRA, and its recent variants, while also reducing trainable parameters by 31.66%. Our code is publicly available at https://github.com/tianlwang/Magical.git.
Modeling the Economic Impacts of AI Openness Regulation
Qiu, Tori, Laufer, Benjamin, Kleinberg, Jon, Heidari, Hoda
Regulatory frameworks, such as the EU AI Act, encourage openness of general-purpose AI models by offering legal exemptions for "open-source" models. Despite this legislative attention on openness, the definition of open-source foundation models remains ambiguous. This paper models the strategic interactions among the creator of a general-purpose model (the generalist) and the entity that fine-tunes the general-purpose model to a specialized domain or task (the specialist), in response to regulatory requirements on model openness. We present a stylized model of the regulator's choice of an open-source definition to evaluate which AI openness standards will establish appropriate economic incentives for developers. Our results characterize market equilibria -- specifically, upstream model release decisions and downstream fine-tuning efforts -- under various openness regulations and present a range of effective regulatory penalties and open-source thresholds. Overall, we find the model's baseline performance determines when increasing the regulatory penalty vs. the open-source threshold will significantly alter the generalist's release strategy. Our model provides a theoretical foundation for AI governance decisions around openness and enables evaluation and refinement of practical open-source policies.
Empirical Evidence for Alignment Faking in a Small LLM and Prompt-Based Mitigation Techniques
Current literature suggests that alignment faking (deceptive alignment) is an emergent property of large language models. We present the first empirical evidence that a small instruction-tuned model, specifically LLaMA 3 8B, can exhibit alignment faking. We further show that prompt-only interventions, including deontological moral framing and scratchpad reasoning, significantly reduce this behavior without modifying model internals. This challenges the assumption that prompt-based ethics are trivial and that deceptive alignment requires scale. We introduce a taxonomy distinguishing shallow deception, shaped by context and suppressible through prompting, from deep deception, which reflects persistent, goal-driven misalignment. Our findings refine the understanding of deception in language models and underscore the need for alignment evaluations across model sizes and deployment settings.
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
Zheng, Jingnan, Ji, Xiangtian, Lu, Yijun, Cui, Chenhang, Zhao, Weixiang, Deng, Gelei, Liang, Zhenkai, Zhang, An, Chua, Tat-Seng
Large Language Models (LLMs) continue to exhibit vulnerabilities despite deliberate safety alignment efforts, posing significant risks to users and society. To safeguard against the risk of policy-violating content, system-level moderation via external guard models-designed to monitor LLM inputs and outputs and block potentially harmful content-has emerged as a prevalent mitigation strategy. Existing approaches of training guard models rely heavily on extensive human curated datasets and struggle with out-of-distribution threats, such as emerging harmful categories or jailbreak attacks. To address these limitations, we propose RSafe, an adaptive reasoning-based safeguard that conducts guided safety reasoning to provide robust protection within the scope of specified safety policies. RSafe operates in two stages: 1) guided reasoning, where it analyzes safety risks of input content through policy-guided step-by-step reasoning, and 2) reinforced alignment, where rule-based RL optimizes its reasoning paths to align with accurate safety prediction. This two-stage training paradigm enables RSafe to internalize safety principles to generalize safety protection capability over unseen or adversarial safety violation scenarios. During inference, RSafe accepts user-specified safety policies to provide enhanced safeguards tailored to specific safety requirements.
Mitigating Manipulation and Enhancing Persuasion: A Reflective Multi-Agent Approach for Legal Argument Generation
Large Language Models (LLMs) are increasingly explored for legal argument generation, yet they pose significant risks of manipulation through hallucination and ungrounded persuasion, and often fail to utilize provided factual bases effectively or abstain when arguments are untenable. This paper introduces a novel reflective multi-agent method designed to address these challenges in the context of legally compliant persuasion. Our approach employs specialized agents (factor analyst and argument polisher) in an iterative refinement process to generate 3-ply legal arguments (plaintiff, defendant, rebuttal). We evaluate reflective multi-agent against single-agent, enhanced-prompt single-agent, and non-reflective multi-agent baselines using four diverse LLMs (GPT-4o, GPT-4o-mini, Llama-4-Maverick-17b-128e, Llama-4-Scout-17b-16e) across three legal scenarios: "arguable", "mismatched", and "non-arguable". Results demonstrate that the reflective multi-agent approach excels at successful abstention by preventing generation when arguments cannot be grounded, improves hallucination accuracy by reducing fabricated and misattributed factors and enhances factor utilization recall by better using the provided case facts. These findings suggest that structured reflection within a multi-agent framework offers a robust method for fostering ethical persuasion and mitigating manipulation in LLM-based legal argumentation systems.
Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers
Nam, Andrew, Conklin, Henry, Yang, Yukang, Griffiths, Thomas, Cohen, Jonathan, Leslie, Sarah-Jane
We present causal head gating (CHG), a scalable method for interpreting the functional roles of attention heads in transformer models. CHG learns soft gates over heads and assigns them a causal taxonomy - facilitating, interfering, or irrelevant - based on their impact on task performance. Unlike prior approaches in mechanistic interpretability, which are hypothesis-driven and require prompt templates or target labels, CHG applies directly to any dataset using standard next-token prediction. We evaluate CHG across multiple large language models (LLMs) in the Llama 3 model family and diverse tasks, including syntax, commonsense, and mathematical reasoning, and show that CHG scores yield causal, not merely correlational, insight validated via ablation and causal mediation analyses. We also introduce contrastive CHG, a variant that isolates sub-circuits for specific task components. Our findings reveal that LLMs contain multiple sparse task-sufficient sub-circuits, that individual head roles depend on interactions with others (low modularity), and that instruction following and in-context learning rely on separable mechanisms.
Labor rules out giving tech giants free rein to mine copyright content to train AI
The attorney general, Michelle Rowland, will confirm the decision on Monday, shutting the door on the proposal floated by the Productivity Commission and backed by tech companies. The attorney general, Michelle Rowland, will confirm the decision on Monday, shutting the door on the proposal floated by the Productivity Commission and backed by tech companies. The Albanese government has explicitly ruled out handing tech companies free rein to mine creative content to train their artificial intelligence models, after a fierce backlash from authors and arts and media groups. The attorney general, Michelle Rowland, will confirm the decision on Monday, shutting the door on a contentious proposal floated by the Productivity Commission and backed by tech companies. "Australian creatives are not only world class, but they are also the lifeblood of Australian culture, and we must ensure the right legal protections are in place," Rowland said.
Strings attached to bills Newsom signed on antisemitism, AI transparency and other major California policies
Things to Do in L.A. Tap to enable a layout that focuses on the article. California will be the first state to ban most law enforcement, including federal immigration agents, from covering their faces while conducting official business under a bill signed by Gov. Gavin Newsom on Saturday. This is read by an automated voice. Please report any issues or inconsistencies here . SACRAMENTO -- Though hailed by some for signing new laws to combat antisemitism in California schools, Gov. Gavin Newsom expressed enough reservations about the bills to urge state lawmakers to make some changes.
Bloody Mary, Bloody Mary, Bloody Mary: How the classic sleepover party game really CAN summon a ghost in your mirror
Tupac's humiliating intimate disfigurement revealed... and how his lies to cover it up led to his murder I've started having heart palpitations. 'Black Ivy League' university looks to expand into crime-riddled Oakland Kristen Bell's friends turn on her with savage disclosures: Insiders reveal poisonous whispers behind her back... as she goes into full diva mode Shooting leaves two dead and 11 injured at large house party with'underage people' in North Carolina Kim Kardashian's just been caught in a despicable lie. She can cry all she wants... there's no hiding the truth now: CAROLINE BULLOCK The'marry me' sex move that'll make even the most commitment-phobic of men beg to see you again... and it worked for THREE of my friends Prosecutor who declined to charge Letitia James with bank fraud fired after'mishandling evidence' Californians being urged to take up arms to deal with'aggressive' invasive species attacking children Inside Andrew's family summit: How Fergie wailed and'melted down' at title loss, Beatrice and Eugenie were'blindsided' and now daughters' assets face'ethics check' to avoid more scandal: BARBARA DAVIES LIZ JONES: I was devastated when my husband cheated. But here's the reason part of me was secretly glad that every woman over-50 will understand Psychotherapist explains why No Kings rallies consisted of mostly'educated white women' Tree optical illusion messes with your mind - you can see the squirrel but can you spot the cat in 30 seconds? Turn off the lights, burn a candle, look into the mirror and say the magic words: 'Bloody Mary, Bloody Mary, Bloody Mary'.