AITopics

2505.17072

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.45)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

The Atlantic - TechnologyMay-30-2025, 11:30:00 GMT

OpenAI Can Stop Pretending

OpenAI is a strange company for strange times. Valued at 300 billion--roughly the same as seven Fords or one and a half PepsiCos--the AI start-up has an era-defining product in ChatGPT and is racing to be the first to build superintelligent machines. The company is also, to the apparent frustration of its CEO Sam Altman, beholden to its nonprofit status. When OpenAI was founded in 2015, it was meant to be a research lab that would work toward the goal of AI that is "safe" and "benefits all of humanity." There wasn't supposed to be any pressure--or desire, really--to make money.

altman, openai, public-benefit corporation, (15 more...)

The Atlantic - Technology

Country:

North America > United States > California (0.15)
North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > UAE (0.04)

Industry:

Law (1.00)
Information Technology (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment

Halloran, John

The model context protocol (MCP) has been widely adapted as an open standard enabling the seamless integration of generative AI agents. However, recent work has shown the MCP is susceptible to retrieval-based "falsely benign" attacks (FBAs), allowing malicious system access and credential theft, but requiring that users download compromised files directly to their systems. Herein, we show that the threat model of MCP-based attacks is significantly broader than previously thought, i.e., attackers need only post malicious content online to deceive MCP agents into carrying out their attacks on unsuspecting victims' systems. To improve alignment guardrails against such attacks, we introduce a new MCP dataset of FBAs and (truly) benign samples to explore the effectiveness of direct preference optimization (DPO) for the refusal training of large language models (LLMs). While DPO improves model guardrails against such attacks, we show that the efficacy of refusal learning varies drastically depending on the model's original post-training alignment scheme--e.g., GRPO-based LLMs learn to refuse extremely poorly. Thus, to further improve FBA refusals, we introduce Retrieval Augmented Generation for Preference alignment (RAG-Pref), a novel preference alignment strategy based on RAG. We show that RAG-Pref significantly improves the ability of LLMs to refuse FBAs, particularly when combined with DPO alignment, thus drastically improving guardrails against MCP-based attacks.

large language model, machine learning, natural language, (18 more...)

2505.23634

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Security Benefits and Side Effects of Labeling AI-Generated Images

Höltervennhoff, Sandra, Ricker, Jonas, Raphael, Maike M., Schwedes, Charlotte, Weil, Rebecca, Fischer, Asja, Holz, Thorsten, Schönherr, Lea, Fahl, Sascha

Generative artificial intelligence is developing rapidly, impacting humans' interaction with information and digital media. It is increasingly used to create deceptively realistic misinformation, so lawmakers have imposed regulations requiring the disclosure of AI-generated content. However, only little is known about whether these labels reduce the risks of AI-generated misinformation. Our work addresses this research gap. Focusing on AI-generated images, we study the implications of labels, including the possibility of mislabeling. Assuming that simplicity, transparency, and trust are likely to impact the successful adoption of such labels, we first qualitatively explore users' opinions and expectations of AI labeling using five focus groups. Second, we conduct a pre-registered online survey with over 1300 U.S. and EU participants to quantitatively assess the effect of AI labels on users' ability to recognize misinformation containing either human-made or AI-generated images. Our focus groups illustrate that, while participants have concerns about the practical implementation of labeling, they consider it helpful in identifying AI-generated images and avoiding deception. However, considering security benefits, our survey revealed an ambiguous picture, suggesting that users might over-rely on labels. While inaccurate claims supported by labeled AI-generated images were rated less credible than those with unlabeled AI-images, the belief in accurate claims also decreased when accompanied by a labeled AI-generated image. Moreover, we find the undesired side effect that human-made images conveying inaccurate claims were perceived as more credible in the presence of labels.

artificial intelligence, machine learning, natural language, (16 more...)

2505.22845

Country:

Europe (1.00)
North America > United States > California (0.28)
Asia > Middle East > Syria (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Personal > Interview (0.67)

Industry:

Transportation (1.00)
Media > News (1.00)
Leisure & Entertainment (1.00)
(7 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(2 more...)

Fantin, Luca, Antonelli, Marco, Cesetti, Margherita, Irto, Daniele, Zamengo, Bruno, Silvestri, Francesco

Design and testing of an agent chatbot supporting decision making with public transport data

--Assessing the quality of public transportation services requires the analysis of large quantities of data on the scheduled and actual trips and documents listing the quality constraints each service needs to meet. Interrogating such datasets with SQL queries, organizing and visualizing the data can be quite complex for most users. This paper presents a chatbot offering a user-friendly tool to interact with these datasets and support decision making. It is based on an agent architecture, which expands the capabilities of the core Large Language Model (LLM) by allowing it to interact with a series of tools that can execute several tasks, like performing SQL queries, plotting data and creating maps from the coordinates of a trip and its stops. This paper also tackles one of the main open problems of such Generative AI projects: collecting data to measure the system's performance. Our chatbot has been extensively tested with a workflow that asks several questions and stores the generated query, the retrieved data and the natural language response for each of them. Such questions are drawn from a set of base examples which are then completed with actual data from the database. This procedure yields a dataset for the evaluation of the chatbot's performance, especially the consistency of its answers and the correctness of the generated queries.

large language model, machine learning, natural language, (18 more...)

2505.22698

Country: Europe > Italy (0.17)

Genre:

Research Report (0.64)
Overview (0.47)

Industry: Transportation > Infrastructure & Services (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Comparing Human and AI Rater Effects Using the Many-Facet Rasch Model

Jiao, Hong, Song, Dan, Lee, Won-Chan

Large language models (LLMs) have been widely explored for automated scoring in low-stakes assessment to facilitate learning and instruction. Empirical evidence related to which LLM produces the most reliable scores and induces least rater effects needs to be collected before the use of LLMs for automated scoring in practice. This study compared ten LLMs (ChatGPT 3.5, ChatGPT 4, ChatGPT 4o, OpenAI o1, Claude 3.5 Sonnet, Gemini 1.5, Gemini 1.5 Pro, Gemini 2.0, as well as DeepSeek V3, and DeepSeek R1) with human expert raters in scoring two types of writing tasks. The accuracy of the holistic and analytic scores from LLMs compared with human raters was evaluated in terms of Quadratic Weighted Kappa. Intra-rater consistency across prompts was compared in terms of Cronbach Alpha. Rater effects of LLMs were evaluated and compared with human raters using the Many-Facet Rasch model. The results in general supported the use of ChatGPT 4o, Gemini 1.5 Pro, and Claude 3.5 Sonnet with high scoring accuracy, better rater reliability, and less rater effects.

large language model, machine learning, natural language, (20 more...)

2505.18486

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.93)

Industry:

Education > Assessment & Standards (0.95)
Education > Educational Setting (0.68)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

Morabito, Roberto, Jang, SiYoung

Smaller, Smarter, Closer: The Edge of Collaborative Generative AI

--The rapid adoption of generative AI (GenAI), particularly Large Language Models (LLMs), has exposed critical limitations of cloud-centric deployments, including latency, cost, and privacy concerns. Meanwhile, Small Language Models (SLMs) are emerging as viable alternatives for resource-constrained edge environments, though they often lack the capabilities of their larger counterparts. This article explores the potential of collaborative inference systems that leverage both edge and cloud resources to address these challenges. By presenting distinct cooperation strategies alongside practical design principles and experimental insights, we offer actionable guidance for deploying GenAI across the computing continuum. Ultimately, this work underscores the great potential of edge-first approaches in realizing the promise of GenAI in diverse, real-world applications. It is no longer necessary to elaborate extensively on the transformative impact of generative AI (GenAI) models, particularly Large Language Models (LLMs), across various sectors of society. From healthcare to education, entertainment to software development and IoT [1], it is evident that nearly every application domain is ready (or already is) to be influenced by these technologies. LLMs like GPT -4, powered by transformer architectures with billions of parameters, excel in diverse NLP tasks (e.g., summarization, translation, query answering) and high-level reasoning.

large language model, machine learning, natural language, (21 more...)

2505.16499

Country: Europe (0.28)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.92)

The GuardianMay-29-2025, 02:00:14 GMT

The OpenAI empire - podcast

In 2019, before most of the world had heard of the company, the technology journalist Karen Hao spent three days embedded in the offices of OpenAI. What she saw, she tells Michael Safi, was a company vastly at odds with its public image: that of a transparent non-profit developing artificial intelligence technology purely for the benefit of humanity. "They said that they were transparent. They said that they were collaborative. They were actually very secretive."

openai empire, podcast

The Guardian

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.82)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.82)

arXiv.org Artificial IntelligenceMay-29-2025

Xinyu AI Search: Enhanced Relevance and Comprehensive Results with Rich Answer Presentations

Tang, Bo, Zhu, Junyi, Xi, Chenyang, Ge, Yunhang, Wu, Jiahao, Feng, Yuchen, Niu, Yijun, Wei, Wenqiang, Yu, Yu, Li, Chunyu, Lin, Zehao, Wu, Hao, Liao, Ning, Yang, Yebin, Wang, Jiajia, Li, Zhiyu, Xiong, Feiyu, Chen, Jingrun

Traditional search engines struggle to synthesize fragmented information for complex queries, while generative AI search engines face challenges in relevance, comprehensiveness, and presentation. To address these limitations, we introduce Xinyu AI Search, a novel system that incorporates a query-decomposition graph to dynamically break down complex queries into sub-queries, enabling stepwise retrieval and generation. Our retrieval pipeline enhances diversity through multi-source aggregation and query expansion, while filtering and re-ranking strategies optimize passage relevance. Additionally, Xinyu AI Search introduces a novel approach for fine-grained, precise built-in citation and innovates in result presentation by integrating timeline visualization and textual-visual choreography. Evaluated on recent real-world queries, Xinyu AI Search outperforms eight existing technologies in human assessments, excelling in relevance, comprehensiveness, and insightfulness. Ablation studies validate the necessity of its key sub-modules. Our work presents the first comprehensive framework for generative AI search engines, bridging retrieval, generation, and user-centric presentation.

information retrieval, large language model, machine learning, (14 more...)

2505.21849

Country:

Asia (1.00)
North America > United States (0.68)
Europe (0.68)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.48)

Industry: Information Technology (0.67)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.55)

Ulloa, Roberto, Zucker, Eve M., Bultmann, Daniel, Simon, David J., Makhortykh, Mykola

From prosthetic memory to prosthetic denial: Auditing whether large language models are prone to mass atrocity denialism

arXiv.org Artificial IntelligenceMay-29-2025

The proliferation of large language models (LLMs) can influence how historical narratives are disseminated and perceived. This study explores the implications of LLMs' responses on the representation of mass atrocity memory, examining whether generative AI systems contribute to prosthetic memory, i.e., mediated experiences of historical events, or to what we term "prosthetic denial," the AI-mediated erasure or distortion of atrocity memories. We argue that LLMs function as interfaces that can elicit prosthetic memories and, therefore, act as experiential sites for memory transmission, but also introduce risks of denialism, particularly when their outputs align with contested or revisionist narratives. To empirically assess these risks, we conducted a comparative audit of five LLMs (Claude, GPT, Llama, Mixtral, and Gemini) across four historical case studies: the Holodomor, the Holocaust, the Cambodian Genocide, and the genocide against the Tutsis in Rwanda. Each model was prompted with questions addressing common denialist claims in English and an alternative language relevant to each case (Ukrainian, German, Khmer, and French). Our findings reveal that while LLMs generally produce accurate responses for widely documented events like the Holocaust, significant inconsistencies and susceptibility to denialist framings are observed for more underrepresented cases like the Cambodian Genocide. The disparities highlight the influence of training data availability and the probabilistic nature of LLM responses on memory integrity. We conclude that while LLMs extend the concept of prosthetic memory, their unmoderated use risks reinforcing historical denialism, raising ethical concerns for (digital) memory preservation, and potentially challenging the advantageous role of technology associated with the original values of prosthetic memory.

large language model, machine learning, natural language, (20 more...)

2505.21753

Country:

Europe (1.00)
North America > United States (0.46)
Africa > Rwanda (0.36)
Asia > Russia (0.28)

Genre: Research Report > New Finding (0.88)

Industry:

Leisure & Entertainment (0.67)
Media > News (0.47)
Government > Regional Government (0.46)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.67)