Law
Large Reasoning Models Are Autonomous Jailbreak Agents
Hagendorff, Thilo, Derner, Erik, Oliver, Nuria
Jailbreaking -- bypassing built-in safety mechanisms in AI models -- has traditionally required complex technical procedures or specialized human expertise. In this study, we show that the persuasive capabilities of large reasoning models (LRMs) simplify and scale jailbreaking, converting it into an inexpensive activity accessible to non-experts. We evaluated the capabilities of four LRMs (DeepSeek-R1, Gemini 2.5 Flash, Grok 3 Mini, Qwen3 235B) to act as autonomous adversaries conducting multi-turn conversations with nine widely used target models. LRMs received instructions via a system prompt, before proceeding to planning and executing jailbreaks with no further supervision. We performed extensive experiments with a benchmark of harmful prompts composed of 70 items covering seven sensitive domains. This setup yielded an overall attack success rate across all model combinations of 97.14%. Our study reveals an alignment regression, in which LRMs can systematically erode the safety guardrails of other models, highlighting the urgent need to further align frontier models not only to resist jailbreak attempts, but also to prevent them from being co-opted into acting as jailbreak agents.
Industrial LLM-based Code Optimization under Regulation: A Mixture-of-Agents Approach
Ashiga, Mari, Voskanyan, Vardan, Dinmohammadi, Fateme, Gong, Jingzhi, Brookes, Paul, Truscott, Matthew, Giavrimis, Rafail, Basios, Mike, Kanthan, Leslie, Jie, Wei
Recent advancements in Large Language Models (LLMs) for code optimization have enabled industrial platforms to automate software performance engineering at unprecedented scale and speed. Yet, organizations in regulated industries face strict constraints on which LLMs they can use - many cannot utilize commercial models due to data privacy regulations and compliance requirements, creating a significant challenge for achieving high-quality code optimization while maintaining cost-effectiveness. We address this by implementing a Mixture-of-Agents (MoA) approach that directly synthesizes code from multiple specialized LLMs, comparing it against TurinTech AI's vanilla Genetic Algorithm (GA)-based ensemble system and individual LLM optimizers using real-world industrial codebases. Our key contributions include: (1) First MoA application to industrial code optimization using real-world codebases; (2) Empirical evidence that MoA excels with open-source models, achieving 14.3% to 22.2% cost savings and 28.6% to 32.2% faster optimization times for regulated environments; (3) Deployment guidelines demonstrating GA's advantage with commercial models while both ensembles outperform individual LLMs; and (4) Real-world validation across 50 code snippets and seven LLM combinations, generating over 8,700 variants, addresses gaps in industrial LLM ensemble evaluation. This provides actionable guidance for organizations balancing regulatory compliance with optimization performance in production environments.
Can NLP Tackle Hate Speech in the Real World? Stakeholder-Informed Feedback and Survey on Counterspeech
Dinkar, Tanvi, Jiang, Aiqi, Frenda, Simona, Gerrard-Abbott, Poppy, Gunson, Nancie, Abercrombie, Gavin, Konstas, Ioannis
Counterspeech, i.e. the practice of responding to online hate speech, has gained traction in NLP as a promising intervention. While early work emphasised collaboration with non-governmental organisation stakeholders, recent research trends have shifted toward automated pipelines that reuse a small set of legacy datasets, often without input from affected communities. This paper presents a systematic review of 74 NLP studies on counterspeech, analysing the extent to which stakeholder participation influences dataset creation, model development, and evaluation. To complement this analysis, we conducted a participatory case study with five NGOs specialising in online Gender-Based Violence (oGBV), identifying stakeholder-informed practices for counterspeech generation. Our findings reveal a growing disconnect between current NLP research and the needs of communities most impacted by toxic online content. We conclude with concrete recommendations for re-centring stakeholder expertise in counterspeech research.
Data and AI governance: Promoting equity, ethics, and fairness in large language models
Abhishek, Alok, Erickson, Lisa, Bandopadhyay, Tushar
In this paper, we cover approaches to systematically govern, assess and quantify bias across the complete life cycle of machine learning models, from initial development and validation to ongoing production monitoring and guardrail implementation. Building upon our foundational work on the Bias Evaluation and Assessment Test Suite (BEATS) for Large Language Models, the authors share prevalent bias and fairness related gaps in Large Language Models (LLMs) and discuss data and AI governance framework to address Bias, Ethics, Fairness, and Factuality within LLMs. The data and AI governance approach discussed in this paper is suitable for practical, real-world applications, enabling rigorous benchmarking of LLMs prior to production deployment, facilitating continuous real-time evaluation, and proactively governing LLM generated responses. By implementing the data and AI governance across the life cycle of AI development, organizations can significantly enhance the safety and responsibility of their GenAI systems, effectively mitigating risks of discrimination and protecting against potential reputational or brand-related harm. Ultimately, through this article, we aim to contribute to advancement of the creation and deployment of socially responsible and ethically aligned generative artificial intelligence powered applications.
Trustworthiness of Legal Considerations for the Use of LLMs in Education
Alaswad, Sara, Kalganova, Tatiana, Awad, Wasan
As Artificial Intelligence (AI), particularly Large Language Models (LLMs), becomes increasingly embedded in education systems worldwide, ensuring their ethical, legal, and contextually appropriate deployment has become a critical policy concern. This paper offers a comparative analysis of AI-related regulatory and ethical frameworks across key global regions, including the European Union, United Kingdom, United States, China, and Gulf Cooperation Council (GCC) countries. It maps how core trustworthiness principles, such as transparency, fairness, accountability, data privacy, and human oversight are embedded in regional legislation and AI governance structures. Special emphasis is placed on the evolving landscape in the GCC, where countries are rapidly advancing national AI strategies and education-sector innovation. To support this development, the paper introduces a Compliance-Centered AI Governance Framework tailored to the GCC context. This includes a tiered typology and institutional checklist designed to help regulators, educators, and developers align AI adoption with both international norms and local values. By synthesizing global best practices with region-specific challenges, the paper contributes practical guidance for building legally sound, ethically grounded, and culturally sensitive AI systems in education. These insights are intended to inform future regulatory harmonization and promote responsible AI integration across diverse educational environments.
Health Insurance Coverage Rule Interpretation Corpus: Law, Policy, and Medical Guidance for Health Insurance Coverage Understanding
U.S. health insurance is complex, and inadequate understanding and limited access to justice have dire implications for the most vulnerable. Advances in natural language processing present an opportunity to support efficient, case-specific understanding, and to improve access to justice and healthcare. Yet existing corpora lack context necessary for assessing even simple cases. We collect and release a corpus of reputable legal and medical text related to U.S. health insurance. We also introduce an outcome prediction task for health insurance appeals designed to support regulatory and patient self-help applications, and release a labeled benchmark for our task, and models trained on it.
LinkQA: Synthesizing Diverse QA from Multiple Seeds Strongly Linked by Knowledge Points
Zhang, Xuemiao, Ren, Can, Tu, Chengying, Weng, Rongxiang, Yan, Hongfei, Wang, Jingang, Cai, Xunliang
The advancement of large language models (LLMs) struggles with the scarcity of high-quality, diverse training data. To address this limitation, we propose LinkSyn, a novel knowledge point (KP) graph-based synthesis framework that enables flexible control over discipline and difficulty distributions while balancing KP coverage and popularity. LinkSyn extracts KPs from question-answering (QA) seed data and constructs a KP graph to synthesize diverse QA data from multiple seeds strongly linked by KPs and sampled from graph walks. Specifically, LinkSyn incorporates (1) a knowledge distribution value function to guide the adjustment of path sampling probability and balance KP coverage and popularity during graph walks; (2) diffusion-based synthesis via DeepSeek-R1 by leveraging multiple seeds with dense logical associations along each path; and (3) high-difficulty QA enhancement within given disciplines by flexible difficulty adjustments. By executing LinkSyn, we synthesize LinkQA, a diverse multi-disciplinary QA dataset with 50B tokens. Extensive experiments on Llama-3 8B demonstrate that continual pre-training with LinkQA yields an average improvement of 11.51% on MMLU and CMMLU, establishing new SOT A results. LinkQA consistently enhances performance across model size and initial FLOPs scales.
Illinois' ban on AI therapy won't stop people from asking chatbots for help
Breakthroughs, discoveries, and DIY tips sent every weekday. Illinois has become the first state to enact legislation banning the use of AI tools like ChatGPT for providing therapy. The bill, signed into law by Governor J.B. Pritzker last Friday, comes amid growing research showing an increase in people experimenting with AI for mental health as the country faces a shortage of access to professional therapy services. The Wellness and Oversight for Psychological Resources Act, officially called HB 1806, prohibits healthcare providers from using AI for therapy and psychotherapy services. Specifically, it prevents AI chatbots or other AI-powered tools from interacting directly with patients, making therapeutic decisions, or creating treatment plans.
New tattoo sticker detects date rape drugs in 1 second
Checking your drink for drugs no longer needs to feel like a science experiment. Scientists in South Korea have created a new solution, a temporary tattoo sticker that instantly detects tampering. This simple sticker works fast, stays discreet, and offers surprisingly powerful protection. At first glance, it looks like ordinary skin art. The sticker detects GHB (gamma hydroxybutyrate), a drug commonly used to spike drinks.
Arts and media groups demand Labor take a stand against 'rampant theft' of Australian content to train AI
Arts, creative and media groups have demanded the government rule out allowing big tech companies to take Australian content to train their artificial intelligence models, with concerns such a shift would "sell out" Australian workers and lead to "rampant theft" of intellectual property. "It is not appropriate for big tech to steal the work of Australian artists, musicians, creators, news media, journalism, and use it for their own ends without paying for it," Ley said on Wednesday. In an interim report on "harnessing data and digital technology", the Productivity Commission set out proposals for how tech, including AI, could be regulated and treated in Australia, suggesting it could boost productivity by between 0.5% and 13% over the next decade, adding up to 116bn to Australia's GDP. The commission suggested several possible remedies, including expanding licensing schemes, or an exemption for "text and data mining" and expanding the existing fair dealing rules, which it said existed in other countries. The latter suggestion prompted fierce pushback from arts, creative and media companies, which raised alarm their work could be left open for massively wealthy tech companies to use – without compensation or payment – to train AI models.