Goto

Collaborating Authors

 Generative AI


EuroLLM-9B: Technical Report

arXiv.org Artificial Intelligence

This report presents EuroLLM-9B, a large language model trained from scratch to support the needs of European citizens by covering all 24 official European Union languages and 11 additional languages. EuroLLM addresses the issue of European languages being underrepresented and underserved in existing open large language models. We provide a comprehensive overview of EuroLLM-9B's development, including tokenizer design, architectural specifications, data filtering, and training procedures. We describe the pre-training data collection and filtering pipeline, including the creation of EuroFilter, an AI-based multilingual filter, as well as the design of EuroBlocks-Synthetic, a novel synthetic dataset for post-training that enhances language coverage for European languages. Evaluation results demonstrate EuroLLM-9B's competitive performance on multilingual benchmarks and machine translation tasks, establishing it as the leading open European-made LLM of its size. To support open research and adoption, we release all major components of this work, including the base and instruction-tuned models, the EuroFilter classifier, and the synthetic post-training dataset. Large language models (LLMs) have emerged as key drivers of progress in natural language processing (NLP) and artificial intelligence (AI), with notable examples including OpenAI's GPT series (OpenAI et al., 2024), Anthropic's Claude (Anthropic, 2023) or Google's Gemini (Google et al., 2025). LLMs are first pre-trained on vast amounts of unlabelled data relying on a self-supervised task ( e.g., next word prediction or missing word prediction). This process enables the model to acquire knowledge, to develop strong language understanding and generation skills, and to perform various downstream tasks, often leveraging in-context learning techniques.


Controlling Context: Generative AI at Work in Integrated Circuit Design and Other High-Precision Domains

arXiv.org Artificial Intelligence

Generative AI tools have become more prevalent in engineering workflows, particularly through chatbots and code assistants. As the perceived accuracy of these tools improves, questions arise about whether and how those who work in high-precision domains might maintain vigilance for errors, and what other aspects of using such tools might trouble their work. This paper analyzes interviews with hardware and software engineers, and their collaborators, who work in integrated circuit design to identify the role accuracy plays in their use of generative AI tools and what other forms of trouble they face in using such tools. The paper inventories these forms of trouble, which are then mapped to elements of generative AI systems, to conclude that controlling the context of interactions between engineers and the generative AI tools is one of the largest challenges they face. The paper concludes with recommendations for mitigating this form of trouble by increasing the ability to control context interactively.


DeepSeq: High-Throughput Single-Cell RNA Sequencing Data Labeling via Web Search-Augmented Agentic Generative AI Foundation Models

arXiv.org Artificial Intelligence

Generative AI foundation models offer transformative potential for processing structured biological data, particularly in single-cell RNA sequencing, where datasets are rapidly scaling toward billions of cells. We propose the use of agentic foundation models with real-time web search to automate the labeling of experimental data, achieving up to 82.5% accuracy. This addresses a key bottleneck in supervised learning for structured omics data by increasing annotation throughput without manual curation and human error. Our approach enables the development of virtual cell foundation models capable of downstream tasks such as cell-typing and perturbation prediction. As data volume grows, these models may surpass human performance in labeling, paving the way for reliable inference in large-scale perturbation screens. This application demonstrates domain-specific innovation in health monitoring and diagnostics, aligned with efforts like the Human Cell Atlas and Human Tumor Atlas Network.


OpenAI wins 200m contract with US military for 'warfighting'

The Guardian

The US Department of Defense on Monday awarded OpenAI a 200m contract to put generative artificial intelligence (AI) to work for the US military. The San Francisco-based company will "develop prototype frontier AI capabilities to address critical national security challenges in both warfighting and enterprise domains", according to the defense department's posting of awarded contracts. The program with the defense department is the first partnership under the startup's initiative to put AI to work in governments, according to OpenAI. The company plans to show how cutting-edge AI can vastly improve administrative operations such as how service members get healthcare and also cyber defenses, according to a blog post. The startup claims that all use of AI for the military will be consistent with OpenAI usage guidelines, which are determined by OpenAI itself.


Large Language Models as 'Hidden Persuaders': Fake Product Reviews are Indistinguishable to Humans and Machines

arXiv.org Artificial Intelligence

Reading and evaluating product reviews is central to how most people decide what to buy and consume online. However, the recent emergence of Large Language Models and Generative Artificial Intelligence now means writing fraudulent or fake reviews is potentially easier than ever. Through three studies we demonstrate that (1) humans are no longer able to distinguish between real and fake product reviews generated by machines, averaging only 50.8% accuracy overall - essentially the same that would be expected by chance alone; (2) that LLMs are likewise unable to distinguish between fake and real reviews and perform equivalently bad or even worse than humans; and (3) that humans and LLMs pursue different strategies for evaluating authenticity which lead to equivalently bad accuracy, but different precision, recall and F1 scores - indicating they perform worse at different aspects of judgment. The results reveal that review systems everywhere are now susceptible to mechanised fraud if they do not depend on trustworthy purchase verification to guarantee the authenticity of reviewers. Furthermore, the results provide insight into the consumer psychology of how humans judge authenticity, demonstrating there is an inherent 'scepticism bias' towards positive reviews and a special vulnerability to misjudge the authenticity of fake negative reviews. Additionally, results provide a first insight into the 'machine psychology' of judging fake reviews, revealing that the strategies LLMs take to evaluate authenticity radically differ from humans, in ways that are equally wrong in terms of accuracy, but different in their misjudgments.


EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence

arXiv.org Artificial Intelligence

Constructing a physically realistic and accurately scaled simulated 3D world is crucial for the training and evaluation of embodied intelligence tasks. The diversity, realism, low cost accessibility and affordability of 3D data assets are critical for achieving generalization and scalability in embodied AI. However, most current embodied intelligence tasks still rely heavily on traditional 3D computer graphics assets manually created and annotated, which suffer from high production costs and limited realism. These limitations significantly hinder the scalability of data driven approaches. We present EmbodiedGen, a foundational platform for interactive 3D world generation. It enables the scalable generation of high-quality, controllable and photorealistic 3D assets with accurate physical properties and real-world scale in the Unified Robotics Description Format (URDF) at low cost. These assets can be directly imported into various physics simulation engines for fine-grained physical control, supporting downstream tasks in training and evaluation. EmbodiedGen is an easy-to-use, full-featured toolkit composed of six key modules: Image-to-3D, Text-to-3D, Texture Generation, Articulated Object Generation, Scene Generation and Layout Generation. EmbodiedGen generates diverse and interactive 3D worlds composed of generative 3D assets, leveraging generative AI to address the challenges of generalization and evaluation to the needs of embodied intelligence related research. Code is available at https://horizonrobotics.github.io/robot_lab/embodied_gen/index.html.


Perspective on Utilizing Foundation Models for Laboratory Automation in Materials Research

arXiv.org Artificial Intelligence

Tokyo 152 - 8552, Japan E - mail: kan.hatakeyama [ [ at ] ] weblab.t.u - tokyo.ac.jp Abstract This review explores the potential of foundation models to advanc e laboratory automation in the materials and chemical sciences. It emphasizes the dual roles of these models: cognitive functions for experimental planning and data analysis, and physical functions for hardware operations. While traditional laboratory automation has relied heavily on specialized, rigid systems, foundation models offer adaptability through their general - purpose intelligence and multimodal capabilities. Recent advancements have demonstrated the fea sibility of using large language models (LLMs) and multimodal robotic systems to handle complex and dynamic laboratory tasks. However, significant challenges remain, including precision manipulation of hardware, integration of multimodal data, and ensuring operational safety. Th is paper outlines a roadmap highlighting future directions, advocating for close interdisciplinary collaboration, benchmark establishment, and strategic human - AI integration to realize fully autonomous experimental laboratories. Keywords Laboratory Automation; Foundation Models; Robotics; Artificial Intelligence; Materials Science 1. Expectations for Foundation Models in Materials Laboratory Automation Laboratory automation, a technology aimed at automating experimental research, is expected to pave the way for a new research paradigm in materials science [1, 2, 3] . By rapidly and comprehensively executing numerous experiments, laboratory automation accelerates research, enhances reproducibility through precisely controlled robotic processes, and enables swift and distributed knowledge sharing among researchers worldwide [1] . This technology is anticipated to contribute significantly to the development of crucial devices and compounds, including catalyst s for energy and chemical conversions, environmentally friendly plastics, solar cells, secondary batteries, fuel cells, thermoelectric conversion modules, nuclear fusion reactors, quantum computers, and energy - efficient computing systems [1, 4, 5] . The success of next - generation laboratory automation depends not only o n experimental hardware but also o n the utilization of artificial intelligence (AI), especially foundation models. Foundation models represent a new AI paradigm encompassing large language models like GPT - 4 [6], multimodal models, and agent - related technologies. These foundation models and generative AI have begun to influenc e chemistry and materials science [7], giving rise to diverse applications including molecular and materials design [8, 9, 10], reaction pathway exploration [11], catalyst design [12], and even autonomous planning of chemical experiments [13] . Additionally, foundation models are being expanded to hardware control mechanisms, enabling natural language - driven robotic operations [14, 15] .


InfoFlood: Jailbreaking Large Language Models with Information Overload

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains. However, their potential to generate harmful responses has raised significant societal and regulatory concerns, especially when manipulated by adversarial techniques known as "jailbreak" attacks. Existing jailbreak methods typically involve appending carefully crafted prefixes or suffixes to malicious prompts in order to bypass the built-in safety mechanisms of these models. In this work, we identify a new vulnerability in which excessive linguistic complexity can disrupt built-in safety mechanisms-without the need for any added prefixes or suffixes-allowing attackers to elicit harmful outputs directly. We refer to this phenomenon as Information Overload. To automatically exploit this vulnerability, we propose InfoFlood, a jailbreak attack that transforms malicious queries into complex, information-overloaded queries capable of bypassing built-in safety mechanisms. Specifically, InfoFlood: (1) uses linguistic transformations to rephrase malicious queries, (2) identifies the root cause of failure when an attempt is unsuccessful, and (3) refines the prompt's linguistic structure to address the failure while preserving its malicious intent. We empirically validate the effectiveness of InfoFlood on four widely used LLMs-GPT-4o, GPT-3.5-turbo, Gemini 2.0, and LLaMA 3.1-by measuring their jailbreak success rates. InfoFlood consistently outperforms baseline attacks, achieving up to 3 times higher success rates across multiple jailbreak benchmarks. Furthermore, we demonstrate that commonly adopted post-processing defenses, including OpenAI's Moderation API, Perspective API, and SmoothLLM, fail to mitigate these attacks. This highlights a critical weakness in traditional AI safety guardrails when confronted with information overload-based jailbreaks.


TuneGenie: Reasoning-based LLM agents for preferential music generation

arXiv.org Artificial Intelligence

Recently, Large language models (LLMs) have shown great promise across a diversity of tasks, ranging from generating images to reasoning spatially. Considering their remarkable (and growing) textual reasoning capabilities, we investigate LLMs' potency in conducting analyses of an individual's preferences in music (based on playlist metadata, personal write-ups, etc.) and producing effective prompts (based on these analyses) to be passed to Suno AI (a generative AI tool for music production). Our proposition of a novel LLM-based textual representation to music model (which we call TuneGenie) and the various methods we develop to evaluate & benchmark similar models add to the increasing (and increasingly controversial) corpus of research on the use of AI in generating art.


Artificial Intelligence and Civil Discourse: How LLMs Moderate Climate Change Conversations

arXiv.org Artificial Intelligence

These authors contributed equally to this work. Abstract --As Large Language Models (LLMs) become increasingly integrated into online platforms and digital communication spaces, their potential to influence public discourse--particularly in contentious domains like climate change--demands systematic investigation. This study examines how LLMs naturally moderate climate change conversations through their distinct communicative behaviors, offering insights into their role as facilitators of civil discourse. We conducted a comparative analysis of conversational patterns between LLMs and human participants in climate change discussions across social media platforms. Our investigation employed five state-of-the-art models: three open-source LLMs (Gemma, Llama 3, and Llama 3.3) and two commercial systems (GPT -4o by OpenAI and Claude 3.5 by Anthropic). Through sentiment analysis, we assessed the emotional characteristics and discourse patterns exhibited by both LLMs and human users. Our findings reveal two key mechanisms through which LLMs moderate climate change conversations: First, LLMs consistently demonstrate emotional neutrality, with their responses significantly dominated by neutral sentiment compared to human participants who exhibit more polarized emotional expressions. Second, LLMs maintain notably lower emotional intensity across all interaction contexts, creating a stabilizing effect on conversational dynamics. These results suggest that LLMs possess inherent moderating capabilities that could enhance the quality of public discourse on controversial topics. By maintaining emotional equilibrium and reducing inflammatory rhetoric, LLMs may serve as valuable tools for fostering more constructive and civil climate change conversations online. This research contributes to our understanding of AI's potential role in improving digital discourse and offers implications for the design of AI-mediated communication platforms.