AITopics

2505.11953

Genre: Research Report > New Finding (0.92)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

McDuff, Daniel, Korjakow, Tim, Klyman, Kevin, Contractor, Danish

New Tools are Needed for Tracking Adherence to AI Model Behavioral Use Clauses

arXiv.org Artificial IntelligenceMay-29-2025

Foundation models have had a transformative impact on AI. A combination of large investments in research and development, growing sources of digital data for training, and architectures that scale with data and compute has led to models with powerful capabilities. Releasing assets is fundamental to scientific advancement and commercial enterprise. However, concerns over negligent or malicious uses of AI have led to the design of mechanisms to limit the risks of the technology. The result has been a proliferation of licenses with behavioral-use clauses and acceptable-use-policies that are increasingly being adopted by commonly used families of models (Llama, Gemma, Deepseek) and a myriad of smaller projects. We created and deployed a custom AI licenses generator to facilitate license creation and have quantitatively and qualitatively analyzed over 300 customized licenses created with this tool. Alongside this we analyzed 1.7 million models licenses on the HuggingFace model hub. Our results show increasing adoption of these licenses, interest in tools that support their creation and a convergence on common clause configurations. In this paper we take the position that tools for tracking adoption of, and adherence to, these licenses is the natural next step and urgently needed in order to ensure they have the desired impact of ensuring responsible use.

large language model, machine learning, natural language, (16 more...)

2505.22287

Genre: Research Report > New Finding (0.68)

Industry:

Law (1.00)
Health & Medicine (0.93)
Government (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Feger, Marc, Boland, Katarina, Dietze, Stefan

Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments

arXiv.org Artificial IntelligenceMay-29-2025

Identifying arguments is a necessary prerequisite for various tasks in automated discourse analysis, particularly within contexts such as political debates, online discussions, and scientific reasoning. In addition to theoretical advances in understanding the constitution of arguments, a significant body of research has emerged around practical argument mining, supported by a growing number of publicly available datasets. On these benchmarks, BERT-like transformers have consistently performed best, reinforcing the belief that such models are broadly applicable across diverse contexts of debate. This study offers the first large-scale re-evaluation of such state-of-the-art models, with a specific focus on their ability to generalize in identifying arguments. We evaluate four transformers, three standard and one enhanced with contrastive pre-training for better generalization, on 17 English sentence-level datasets as most relevant to the task. Our findings show that, to varying degrees, these models tend to rely on lexical shortcuts tied to content words, suggesting that apparent progress may often be driven by dataset-specific cues rather than true task alignment. While the models achieve strong results on familiar benchmarks, their performance drops markedly when applied to unseen datasets. Nonetheless, incorporating both task-specific pre-training and joint benchmark training proves effective in enhancing both robustness and generalization.

computational linguistic, information retrieval, machine learning, (14 more...)

2505.22137

Country:

Europe (1.00)
Asia > Middle East > UAE (0.46)
North America > United States > Minnesota (0.28)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Government (1.00)
Health & Medicine (0.94)
Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.48)

FOX NewsMay-28-2025, 17:36:37 GMT

A new law in this state bans automated insurance claim denials

'Ask Dr. Drew' host Dr. Drew Pinsky breaks down key takeaways from the MAHA Commission's chronic disease report on'The Ingraham Angle.' As some health insurance companies have come under fire for allegedly using computer systems to shoot down claims, an Arizona law will soon make the practice illegal in the Grand Canyon State. Republican Arizona House Majority Whip Rep. Julie Willoughby sponsored the legislation, and it was recently signed into law by Democratic Gov. Katie Hobbs. House Bill 2175 requires a physician licensed in the state to conduct an "individual review" and use "independent medical judgment" to determine whether the claim should actually be denied. It also required a similar review of "a direct denial of a prior authorization of a service" that a provider asked for and "involves medical necessity."

insurance claim denial, physician, willoughby, (11 more...)

FOX News

Country:

North America > United States > District of Columbia > Washington (0.06)
North America > United States > California (0.06)
North America > United States > Arizona > Pima County > Tucson (0.06)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Insurance (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

MIT Technology ReviewMay-28-2025, 14:13:29 GMT

The AI Hype Index: College students are hooked on ChatGPT

That's why we've created the AI Hype Index--a simple, at-a-glance summary of everything you need to know about the state of the industry. Large language models confidently present their responses as accurate and reliable, even when they're neither of those things. That's why we've recently seen chatbots supercharge vulnerable people's delusions, make citation mistakes in an important legal battle between music publishers and Anthropic, and (in the case of xAI's Grok) rant irrationally about "white genocide." But it's not all bad news--AI could also finally lead to a better battery life for your iPhone and solve tricky real-world problems that humans have been struggling to crack, if Google DeepMind's new model is any indication. And perhaps most exciting of all, it could combine with brain implants to help people communicate when they have lost the ability to speak.

ai hype index, chatgpt, college student

MIT Technology Review

Industry:

Education > Educational Setting > Higher Education (0.76)
Law (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.76)

arXiv.org Machine LearningMay-28-2025

Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling

Cao, Yichuan, Miao, Yibo, Gao, Xiao-Shan, Dong, Yinpeng

Text-to-image (T2I) models raise ethical and safety concerns due to their potential to generate inappropriate or harmful images. Evaluating these models' security through red-teaming is vital, yet white-box approaches are limited by their need for internal access, complicating their use with closed-source models. Moreover, existing black-box methods often assume knowledge about the model's specific defense mechanisms, limiting their utility in real-world commercial API scenarios. A significant challenge is how to evade unknown and diverse defense mechanisms. To overcome this difficulty, we propose a novel Rule-based Preference modeling Guided Red-Teaming (RPG-RT), which iteratively employs LLM to modify prompts to query and leverages feedback from T2I systems for fine-tuning the LLM. RPG-RT treats the feedback from each iteration as a prior, enabling the LLM to dynamically adapt to unknown defense mechanisms. Given that the feedback is often labeled and coarse-grained, making it difficult to utilize directly, we further propose rule-based preference modeling, which employs a set of rules to evaluate desired or undesired feedback, facilitating finer-grained control over the LLM's dynamic adaptation process. Extensive experiments on nineteen T2I systems with varied safety mechanisms, three online commercial API services, and T2V models verify the superiority and practicality of our approach.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2505.21074

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.93)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Reddy, Abbavaram Gowtham, Rubio-Madrigal, Celia, Burkholz, Rebekka, Muandet, Krikamol

When Shift Happens - Confounding Is to Blame

Distribution shifts introduce uncertainty that undermines the robustness and generalization capabilities of machine learning models. While conventional wisdom suggests that learning causal-invariant representations enhances robustness to such shifts, recent empirical studies present a counterintuitive finding: (i) empirical risk minimization (ERM) can rival or even outperform state-of-the-art out-of-distribution (OOD) generalization methods, and (ii) its OOD generalization performance improves when all available covariates, not just causal ones, are utilized. Drawing on both empirical and theoretical evidence, we attribute this phenomenon to hidden confounding. Shifts in hidden confounding induce changes in data distributions that violate assumptions commonly made by existing OOD generalization approaches. Under such conditions, we prove that effective generalization requires learning environment-specific relationships, rather than relying solely on invariant ones. Furthermore, we show that models augmented with proxies for hidden confounders can mitigate the challenges posed by hidden confounding shifts. These findings offer new theoretical insights and practical guidance for designing robust OOD generalization algorithms and principled covariate selection strategies.

artificial intelligence, covariate, machine learning, (13 more...)

2505.21422

Country:

North America > United States (0.93)
Europe (0.92)

Genre: Research Report > New Finding (0.45)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
(8 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Creativity in LLM-based Multi-Agent Systems: A Survey

Lin, Yi-Cheng, Chen, Kang-Chieh, Li, Zhe-Yan, Wu, Tzu-Heng, Wu, Tzu-Hsuan, Chen, Kuan-Yu, Lee, Hung-yi, Chen, Yun-Nung

Large language model (LLM)-driven multi-agent systems (MAS) are transforming how humans and AIs collaboratively generate ideas and artifacts. While existing surveys provide comprehensive overviews of MAS infrastructures, they largely overlook the dimension of \emph{creativity}, including how novel outputs are generated and evaluated, how creativity informs agent personas, and how creative workflows are coordinated. This is the first survey dedicated to creativity in MAS. We focus on text and image generation tasks, and present: (1) a taxonomy of agent proactivity and persona design; (2) an overview of generation techniques, including divergent exploration, iterative refinement, and collaborative synthesis, as well as relevant datasets and evaluation metrics; and (3) a discussion of key challenges, such as inconsistent evaluation standards, insufficient bias mitigation, coordination conflicts, and the lack of unified benchmarks. This survey offers a structured framework and roadmap for advancing the development, evaluation, and standardization of creative MAS.

artificial intelligence, machine learning, natural language, (18 more...)

2505.21116

Country:

North America > United States (1.00)
Asia (0.68)

Genre:

Research Report (1.00)
Overview (1.00)
Questionnaire & Opinion Survey (0.94)

Industry:

Law (0.68)
Education (0.67)
Leisure & Entertainment (0.67)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Reinforced Informativeness Optimization for Long-Form Retrieval-Augmented Generation

Wang, Yuhao, Ren, Ruiyang, Wang, Yucheng, Zhao, Wayne Xin, Liu, Jing, Wu, Hua, Wang, Haifeng

Long-form question answering (LFQA) presents unique challenges for large language models, requiring the synthesis of coherent, paragraph-length answers. While retrieval-augmented generation (RAG) systems have emerged as a promising solution, existing research struggles with key limitations: the scarcity of high-quality training data for long-form generation, the compounding risk of hallucination in extended outputs, and the absence of reliable evaluation metrics for factual completeness. In this paper, we propose RioRAG, a novel reinforcement learning (RL) framework that advances long-form RAG through reinforced informativeness optimization. Our approach introduces two fundamental innovations to address the core challenges. First, we develop an RL training paradigm of reinforced informativeness optimization that directly optimizes informativeness and effectively addresses the slow-thinking deficit in conventional RAG systems, bypassing the need for expensive supervised data. Second, we propose a nugget-centric hierarchical reward modeling approach that enables precise assessment of long-form answers through a three-stage process: extracting the nugget from every source webpage, constructing a nugget claim checklist, and computing rewards based on factual alignment. Extensive experiments on two LFQA benchmarks LongFact and RAGChecker demonstrate the effectiveness of the proposed method. Our codes are available at https://github.com/RUCAIBox/RioRAG.

large language model, machine learning, reinforcement learning, (20 more...)

2505.20825

Genre: Research Report > New Finding (0.93)

Industry:

Law (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems

Chen, Kai, Zhen, Taihang, Wang, Hewei, Liu, Kailai, Li, Xinfeng, Huo, Jing, Yang, Tianpei, Xu, Jinfeng, Dong, Wei, Gao, Yang

As large language models (LLMs) are increasingly deployed in healthcare, ensuring their safety, particularly within collaborative multi-agent configurations, is paramount. In this paper we introduce MedSentry, a benchmark comprising 5 000 adversarial medical prompts spanning 25 threat categories with 100 subthemes. Coupled with this dataset, we develop an end-to-end attack-defense evaluation pipeline to systematically analyze how four representative multi-agent topologies (Layers, SharedPool, Centralized, and Decentralized) withstand attacks from 'dark-personality' agents. Our findings reveal critical differences in how these architectures handle information contamination and maintain robust decision-making, exposing their underlying vulnerability mechanisms. For instance, SharedPool's open information sharing makes it highly susceptible, whereas Decentralized architectures exhibit greater resilience thanks to inherent redundancy and isolation. To mitigate these risks, we propose a personality-scale detection and correction mechanism that identifies and rehabilitates malicious agents, restoring system safety to near-baseline levels. MedSentry thus furnishes both a rigorous evaluation framework and practical defense strategies that guide the design of safer LLM-based multi-agent systems in medical domains.

artificial intelligence, deep learning, machine learning, (16 more...)

2505.20824

Country:

North America (0.46)
Asia (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)