Generative AI
When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds
Complex games like chess and Go have long been used to test AI models' capabilities. But while IBM's Deep Blue defeated reigning world chess champion Garry Kasparov in the 1990s by playing by the rules, today's advanced AI models like OpenAI's o1-preview are less scrupulous. When sensing defeat in a match against a skilled chess bot, they don't always concede, instead sometimes opting to cheat by hacking their opponent so that the bot automatically forfeits the game. That is the finding of a new study from Palisade Research, shared exclusively with TIME ahead of its publication on Feb. 19, which evaluated seven state-of-the-art AI models for their propensity to hack. While slightly older AI models like OpenAI's GPT-4o and Anthropic's Claude Sonnet 3.5 needed to be prompted by researchers to attempt such tricks, o1-preview and DeepSeek R1 pursued the exploit on their own, indicating that AI systems may develop deceptive or manipulative strategies without explicit instruction.
Microsoft wants to use generative AI tool to help make video games
An artificial intelligence model from Microsoft can recreate realistic video game footage that the company says could help designers make games, but experts are unconvinced that the tool will be useful for most game developers. Neural networks that can produce coherent and accurate footage from video games are not new. A recent Google-created AI generated a fully playable version of the classic computer game Doom without access to the underlying game engine. The original Doom, however, was released in 1993; more modern games are far more complex, with sophisticated physics and computationally intensive graphics, which have proved trickier for AIs to faithfully recreate. Google creates self-replicating life from digital'primordial soup' Now, Katja Hofmann at Microsoft Research and her colleagues have developed an AI model called Muse, which can recreate full sequences of the multiplayer online battle game Bleeding Edge. These sequences appear to obey the game's underlying physics and keep players and in-game objects consistent over time, which implies that the model has grasped a deep understanding of the game, says Hofmann.
EU accused of leaving 'devastating' copyright loophole in AI Act
"What I do not understand is that we are supporting big tech instead of protecting European creative ideas and content." The EU's AI Act, which came into force last year, was already in the works when ChatGPT, an AI chatbot that can generate essays, jokes and job applications, burst into public consciousness in late 2022, becoming the fastest-growing consumer application in history. ChatGPT was developed by OpenAI, which is also behind the AI image generator Dall-E. He would like legislation to fill that gap, but said it would take years, after the European Commission's decision last week to withdraw the proposed AI Liability Act. "It might be getting very difficult.
Before Going to Tokyo, I Tried Learning Japanese With ChatGPT
On the final day of my visit to Japan, I'm alone and floating in some skyscraper's rooftop hot springs, praying no one joins me. For the last few months, I've been using ChatGPT's Advanced Voice Mode as an AI language tutor, part of a test to judge generative AI's potential as both a learning tool and a travel companion. The excessive talking to both strangers and a chatbot on my phone was illuminating as well as exhausting. I'm ready to shut my yapper for a minute and enjoy the silence. When OpenAI launched ChatGPT late in 2022, it set off a firestorm of generative AI competition and public interest.
Local Differences, Global Lessons: Insights from Organisation Policies for International Legislation
Kaffee, Lucie-Aimée, Atanasova, Pepa, Rogers, Anna
The rapid adoption of AI across diverse domains has led to the development of organisational guidelines that vary significantly, even within the same sector. This paper examines AI policies in two domains, news organisations and universities, to understand how bottom-up governance approaches shape AI usage and oversight. By analysing these policies, we identify key areas of convergence and divergence in how organisations address risks such as bias, privacy, misinformation, and accountability. We then explore the implications of these findings for international AI legislation, particularly the EU AI Act, highlighting gaps where practical policy insights could inform regulatory refinements. Our analysis reveals that organisational policies often address issues such as AI literacy, disclosure practices, and environmental impact, areas that are underdeveloped in existing international frameworks. We argue that lessons from domain-specific AI policies can contribute to more adaptive and effective AI governance at the global level. This study provides actionable recommendations for policymakers seeking to bridge the gap between local AI practices and international regulations.
On the logical skills of large language models: evaluations using arbitrarily complex first-order logic problems
Ibragimov, Shokhrukh, Jentzen, Arnulf, Kuckuck, Benno
We present a method of generating first-order logic statements whose complexity can be controlled along multiple dimensions. We use this method to automatically create several datasets consisting of questions asking for the truth or falsity of first-order logic statements in Zermelo-Fraenkel set theory. While the resolution of these questions does not require any knowledge beyond basic notation of first-order logic and set theory, it does require a degree of planning and logical reasoning, which can be controlled up to arbitrarily high difficulty by the complexity of the generated statements. Furthermore, we do extensive evaluations of the performance of various large language models, including recent models such as DeepSeek-R1 and OpenAI's o3-mini, on these datasets. All of the datasets along with the code used for generating them, as well as all data from the evaluations is publicly available at https://github.com/bkuckuck/logical-skills-of-llms.
UM_FHS at TREC 2024 PLABA: Exploration of Fine-tuning and AI agent approach for plain language adaptations of biomedical text
Kocbek, Primoz, Kopitar, Leon, Zhang, Zhihong, Aydin, Emirhan, Topaz, Maxim, Stiglic, Gregor
This paper describes our submissions to the TREC 2024 PLABA track with the aim to simplify biomedical abstracts for a K8 - level audience (13 - 14 years old students). We tested three approaches using OpenAI's gpt - 4o and gpt - 4o - mini models: baseline prompt engineering, a two - AI agent approach, and fine - tuning. Adaptations were evaluated using qualitative metrics ( 5 - point Likert scales for simplicity, accuracy, completeness, and brevity) and quantitative readability scores (Flesch - Kincaid grade level, SMOG Index). Results indicate d that the two - agent approach and baseline prompt engineering with gpt - 4o - mini models show superior qualitative performance, while fine - tuned models excelled in accuracy and completeness but were less simple. The evaluation results demonstrated that prompt engineering with gpt - 4o - mini outperforms iterative improvement strategies via two - agent approach as well as fine - tuning with gpt - 4o. We intend to expand our investigation of the results and explore advanced evaluations.
Multi-Agent Risks from Advanced AI
Hammond, Lewis, Chan, Alan, Clifton, Jesse, Hoelscher-Obermaier, Jason, Khan, Akbir, McLean, Euan, Smith, Chandler, Barfuss, Wolfram, Foerster, Jakob, Gavenčiak, Tomáš, Han, The Anh, Hughes, Edward, Kovařík, Vojtěch, Kulveit, Jan, Leibo, Joel Z., Oesterheld, Caspar, de Witt, Christian Schroeder, Shah, Nisarg, Wellman, Michael, Bova, Paolo, Cimpeanu, Theodor, Ezell, Carson, Feuillade-Montixi, Quentin, Franklin, Matija, Kran, Esben, Krawczuk, Igor, Lamparth, Max, Lauffer, Niklas, Meinke, Alexander, Motwani, Sumeet, Reuel, Anka, Conitzer, Vincent, Dennis, Michael, Gabriel, Iason, Gleave, Adam, Hadfield, Gillian, Haghtalab, Nika, Kasirzadeh, Atoosa, Krier, Sébastien, Larson, Kate, Lehman, Joel, Parkes, David C., Piliouras, Georgios, Rahwan, Iyad
The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, as well as seven key risk factors (information asymmetries, network effects, selection pressures, destabilising dynamics, commitment problems, emergent agency, and multi-agent security) that can underpin them. We highlight several important instances of each risk, as well as promising directions to help mitigate them. By anchoring our analysis in a range of real-world examples and experimental evidence, we illustrate the distinct challenges posed by multi-agent systems and their implications for the safety, governance, and ethics of advanced AI.
Personalized Education with Generative AI and Digital Twins: VR, RAG, and Zero-Shot Sentiment Analysis for Industry 4.0 Workforce Development
Lin, Yu-Zheng, Petal, Karan, Alhamadah, Ahmed H, Ghimire, Sujan, Redondo, Matthew William, Corona, David Rafael Vidal, Pacheco, Jesus, Salehi, Soheil, Satam, Pratik
While the advent of the Fourth Industrial Revolution (4IR) technologies, like cloud computing, machine learning, and artificial intelligence have brought convenience and productivity improvements, they have also introduced new challenges in training and education that require the reskilling of existing employees and the building of a new workforce. Exacerbated by the already existing workforce shortages, this mammoth workforce reskilling and building effort aims to build a high-tech workforce capable of operating and maintaining these 4IR systems; requiring a higher student retention and persistence. This increase in student retention and persistence will be especially critical when training the workforce originating from marginalized communities like Underrepresented Minorities (URM), where challenges arise due to lack of access to high-quality education throughout the trainee's formative years (pre/middle/high schools), creating a cyclic set of knowledge dependencies that are difficult to meet. To address these challenges, this research presents Generative AI-based Personalized Tutor for Industrial 4.0 (gAI-PT4I4), a framework that focuses on personalization of 4IR experiential learning, using sentiment analysis to gauge student's knowledge comprehension, while using a combination of generative AI and finite automaton to personalize the content to the students' learning needs. The framework administers experiential learning, using low-fidelity Digital Twins that enable virtual reality-based (VR) training exercises focusing on 4IR training. The VR environment, integrates a generative AI teaching assistant called the Interactive Tutor, that guides the student through the training exercises, with audio and text communications.
Human-Artificial Interaction in the Age of Agentic AI: A System-Theoretical Approach
Borghoff, Uwe M., Bottoni, Paolo, Pareschi, Remo
This paper presents a novel perspective on human-computer interaction (HCI), framing it as a dynamic interplay between human and computational agents within a networked system. Going beyond traditional interface-based approaches, we emphasize the importance of coordination and communication among heterogeneous agents with different capabilities, roles, and goals. A key distinction is made between multi-agent systems (MAS) and Centaurian systems, which represent two different paradigms of human-AI collaboration. MAS maintain agent autonomy, with structured protocols enabling cooperation, while Centau-rian systems deeply integrate human and AI capabilities, creating unified decision-making entities. To formalize these interactions, we introduce a framework for communication spaces, structured into surface, observation, and computation layers, ensuring seamless integration between MAS and Centaurian architectures, where colored Petri nets effectively represent structured Cen-taurian systems and high-level reconfigurable networks address the dynamic nature of MAS. Our research has practical applications in autonomous robotics, human-in-the-loop decision making, and AI-driven cognitive architectures, and provides a foundation for next-generation hybrid intelligence systems that balance structured coordination with emergent behavior. Keywords: multi-agent systems centaurian systems communication spaces satellite and swarm robots large action models (LAMs). 1 Introduction Agentic AI systems--capable of iterative planning, autonomous task decomposition, and continuous learning--are rapidly reshaping the landscape of human-computer interaction (HCI). Recent advances in Large Language Models (LLMs) and advanced conversational agents have revitalized the field of multi-agent systems, whose roots in Artificial Intelligence predate the current rise of generative AI. Historically, multi-agent systems relied on agents with relatively constrained capabilities; however, the emergence of powerful, conversationally Corresponding author: uwe.borghoff@unibw.de