Law
Compliance of AI Systems
Schöning, Julius, Kruse, Niklas
The increasing integration of artificial intelligence (AI) systems in various fields requires solid concepts to ensure compliance with upcoming legislation. This paper systematically examines the compliance of AI systems with relevant legislation, focusing on the EU's AI Act and the compliance of data sets. The analysis highlighted many challenges associated with edge devices, which are increasingly being used to deploy AI applications closer and closer to the data sources. Such devices often face unique issues due to their decentralized nature and limited computing resources for implementing sophisticated compliance mechanisms. By analyzing AI implementations, the paper identifies challenges and proposes the first best practices for legal compliance when developing, deploying, and running AI. The importance of data set compliance is highlighted as a cornerstone for ensuring the trustworthiness, transparency, and explainability of AI systems, which must be aligned with ethical standards set forth in regulatory frameworks such as the AI Act. The insights gained should contribute to the ongoing discourse on the responsible development and deployment of embedded AI systems.
Revitalizing Saturated Benchmarks: A Weighted Metric Approach for Differentiating Large Language Model Performance
Etzine, Bryan, Hashemi, Masoud, Madhusudhan, Nishanth, Davasam, Sagar, Sharma, Roshnee, Madhusudhan, Sathwik Tejaswi, Yadav, Vikas
Existing benchmarks are becoming saturated and struggle to separate model performances due to factors like data contamination and advancing LLM capabilities. This paper introduces EMDM (Enhanced Model Differentiation Metric), a novel weighted metric that revitalizes benchmarks by enhancing model separation. EMDM integrates final answer and Chain-of-Thought (CoT) reasoning correctness, assigning weights based on the complexity and reasoning depth required to solve a given sample in the evaluation data. Using a baseline LLM in two setups-Unguided, where the model has no prior exposure to test samples, and Guided, where the model has prior knowledge of the desired answer-EMDM distinguishes instances of varying difficulty. The CoT and answer correctness from these setups inform an optimization objective for weight assignment, resulting in a more nuanced evaluation of model performance. Compared to the exact match (EM) metric, which achieves 17% separation on ARC-Challenge, EMDM achieves 46%, demonstrating its effectiveness in differentiating models based on reasoning and knowledge requirements.
Cognitive Bias Detection Using Advanced Prompt Engineering
Lemieux, Frederic, Behr, Aisha, Kellermann-Bryant, Clara, Mohammed, Zaki
Cognitive biases, systematic deviations from rationality in judgment, pose significant challenges in generating objective content. This paper introduces a novel approach for real-time cognitive bias detection in user-generated text using large language models (LLMs) and advanced prompt engineering techniques. The proposed system analyzes textual data to identify common cognitive biases such as confirmation bias, circular reasoning, and hidden assumption. By designing tailored prompts, the system effectively leverages LLMs' capabilities to both recognize and mitigate these biases, improving the quality of human-generated content (e.g., news, media, reports). Experimental results demonstrate the high accuracy of our approach in identifying cognitive biases, offering a valuable tool for enhancing content objectivity and reducing the risks of biased decisionmaking. Introduction Cognitive biases are systematic patterns of deviation from rational judgment, affecting decision-making processes across various domains, including media, policy-making, and legal reasoning. With the rapid expansion of artificial intelligence (AI) applications, large language models (LLMs) have demonstrated significant potential in processing and evaluating vast amounts of textual information. However, existing research has largely focused on mitigating biases within AI-generated outputs rather than leveraging AI to detect biases in human-generated content. This gap presents a critical challenge in ensuring transparency and fairness in AI-assisted decision-making. This study explores the application of structured prompt engineering as a novel approach to improving LLM accuracy in detecting cognitive biases.
Improving Hate Speech Classification with Cross-Taxonomy Dataset Integration
Algorithmic hate speech detection faces significant challenges due to the diverse definitions and datasets used in research and practice. Social media platforms, legal frameworks, and institutions each apply distinct yet overlapping definitions, complicating classification efforts. This study addresses these challenges by demonstrating that existing datasets and taxonomies can be integrated into a unified model, enhancing prediction performance and reducing reliance on multiple specialized classifiers. The work introduces a universal taxonomy and a hate speech classifier capable of detecting a wide range of definitions within a single framework. Our approach is validated by combining two widely used but differently annotated datasets, showing improved classification performance on an independent test set. This work highlights the potential of dataset and taxonomy integration in advancing hate speech detection, increasing efficiency, and ensuring broader applicability across contexts.
Jailbreaking is (Mostly) Simpler Than You Think
Russinovich, Mark, Salem, Ahmed
The rapid advancement of artificial intelligence has coincided with increasing concerns regarding the safe and ethical deployment of these systems. As AI models become more capable, ensuring that their behavior aligns with societal norms and safety standards has emerged as a critical research challenge. State-of-the-art alignment techniques--such as reinforcement learning from human feedback and rulebased fine-tuning--strive to constrain models to acceptable ethical behaviors. However, these methods face an inherent tension: while alignment is designed to prevent the disclosure of harmful or sensitive information, adversaries can leverage the gap between a model's potential and its restricted behavior through what is known as a jailbreak. In the context of AI, a jailbreak is any method that circumvents established safety protocols, effectively enabling functionalities that the system would otherwise suppress. Current jailbreaks typically deploy elaborate prompt constructions or optimization strategies; in contrast, in this paper we present the Context Compliance Attack (CCA), a simple optimization-free jailbreak. CCA leverages a basic yet critical design flaw--the reliance on client-supplied conversation history--to subvert the AI systems' safeguards and jailbreak them. This paper investigates the efficacy of CCA and explores its implications on current AI safety architectures.
Transformers for molecular property prediction: Domain adaptation efficiently improves performance
Sultan, Afnan, Rausch-Dupont, Max, Khan, Shahrukh, Kalinina, Olga, Volkamer, Andrea, Klakow, Dietrich
Most of the current transformer-based chemical language models are pre-trained on millions to billions of molecules. However, the improvement from such scaling in dataset size is not confidently linked to improved molecular property prediction. The aim of this study is to investigate and overcome some of the limitations of transformer models in predicting molecular properties. Specifically, we examine the impact of pre-training dataset size and diversity on the performance of transformer models and investigate the use of domain adaptation as a technique for improving model performance. First, our findings indicate that increasing pretraining dataset size beyond 400K molecules from the GuacaMol dataset does not result in a significant improvement on four ADME endpoints, namely, solubility, permeability, microsomal stability, and plasma protein binding. Second, our results demonstrate that using domain adaptation by further training the transformer model on a small set of domain-relevant molecules, i.e., a few hundred to a few thousand, using multi-task regression of physicochemical properties was sufficient to significantly improve performance for three out of the four investigated ADME endpoints (P-value < 0.001). Finally, we observe that a model pre-trained on 400K molecules and domain adopted on a few hundred/thousand molecules performs similarly (P-value > 0.05) to more complicated transformer models like MolBERT(pre-trained on 1.3M molecules) and MolFormer (pre-trained on 100M molecules). A comparison to a random forest model trained on basic physicochemical properties showed similar performance to the examined transformer models. We believe that current transformer models can be improved through further systematic analysis of pre-training and downstream data, pre-training objectives, and scaling laws, ultimately leading to better and more helpful models.
Llamarine: Open-source Maritime Industry-specific Large Language Model
Nguyen, William, Phan, An, Kimura, Konobu, Maeno, Hitoshi, Tanaka, Mika, Le, Quynh, Poucher, William, Nguyen, Christopher
Large Language Models (LLMs) have demonstrated substantial potential in addressing complex reasoning tasks, yet their general-purpose nature often limits their effectiveness in specialized domains such as maritime navigation. To bridge this gap, we introduce Llamarine, the first open-source LLM designed specifically for maritime navigation. Llamarine 1.0 is developed through continued pretraining and fine-tuning on a high-quality corpus comprising maritime textbooks, research publications, and web text from Wikipedia. This domain-specific training enables the model to acquire expert-level knowledge in navigational principles, collision avoidance, route optimization, and regulatory compliance. Our key contributions include (a) the curation of a comprehensive maritime dataset from authoritative sources, ensuring depth and reliability in the model's knowledge base; (b) the development of a foundational model capable of reasoning about complex navigational challenges with greater accuracy than general-purpose LLMs; and (c) the establishment of a benchmark to evaluate performance in maritime-specific decision-making tasks. Experimental results demonstrate that Llamarine outperforms both general-purpose and commercial LLMs in critical navigation-related tasks, such as trajectory planning, risk assessment, and compliance with maritime regulations. By providing an open-source foundation model trained exclusively on high-quality maritime literature, Llamarine paves the way for AI-driven advancements in maritime safety, efficiency, and operational decision-making.
He's Using Autism as a Defense for a Capital Murder. It Might Work.
Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily. Bryan Kohberger is accused of committing an unspeakably evil act, stabbing to death four University of Idaho students in their off-campus home in November 2022. The killings were brutal, and as soon as Kohberger was arrested, some members of the victims' families demanded that he should be executed if he is convicted. Kohberger is due to stand trial in August. In the run-up to that trial, his defense lawyers have filed a flurry of motions challenging various aspects of the prosecution's case. Filing such motions is standard in death cases, though in Kohberger's case, the defense and prosecution have done much of that work in secret.
It's time to ban Chinese AI app DeepSeek from 'government devices,' state AGs urge Congress
Trump counselor Alina Habba responds to concerns of China buying up American real estate on'The Ingraham Angle.' State attorneys general have joined the growing calls from elected officials urging Congress to pass a law banning the Chinese-owned DeepSeek AI app on all government devices, saying "China is a clear and present danger" to the U.S. "DeepSeek appears to be another tool for Chinese spies to attack America's national security," the letter, signed by 21 attorneys general to House and Senate leaders, said. "Given the Chinese desire to steal America's secrets and the ability of DeepSeek to carry out this theft, Congress should quickly pass legislation to ban DeepSeek on government devices," the letter read. "Congress passed similar legislation two years ago to prevent TikTok from stealing information from our government." Montana AG Austin Knudsen, who drafted the letter, wrote that "China is trying to steal America's secrets. Congress should shut down China's latest Trojan horse by passing the No DeepSeek on Government Devices Act."
Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences
Shahid, Adnan, Kliks, Adrian, Al-Tahmeesschi, Ahmed, Elbakary, Ahmed, Nikou, Alexandros, Maatouk, Ali, Mokh, Ali, Kazemi, Amirreza, De Domenico, Antonio, Karapantelakis, Athanasios, Cheng, Bo, Yang, Bo, Wang, Bohao, Fischione, Carlo, Zhang, Chao, Issaid, Chaouki Ben, Yuen, Chau, Peng, Chenghui, Huang, Chongwen, Chaccour, Christina, Thomas, Christo Kurisummoottil, Sharma, Dheeraj, Kalogiros, Dimitris, Niyato, Dusit, De Poorter, Eli, Mhanna, Elissa, Strinati, Emilio Calvanese, Bader, Faouzi, Abdeldayem, Fathi, Wang, Fei, Zhu, Fenghao, Fontanesi, Gianluca, Geraci, Giovanni, Zhou, Haibo, Purmehdi, Hakimeh, Ahmadi, Hamed, Zou, Hang, Du, Hongyang, Lee, Hoon, Yang, Howard H., Poli, Iacopo, Carron, Igor, Chatzistefanidis, Ilias, Lee, Inkyu, Pitsiorlas, Ioannis, Fontaine, Jaron, Wu, Jiajun, Zeng, Jie, Li, Jinan, Karam, Jinane, Gemayel, Johny, Deng, Juan, Frison, Julien, Huang, Kaibin, Qiu, Kehai, Ball, Keith, Wang, Kezhi, Guo, Kun, Tassiulas, Leandros, Gwenole, Lecorve, Yue, Liexiang, Bariah, Lina, Powell, Louis, Dryjanski, Marcin, Galdon, Maria Amparo Canaveras, Kountouris, Marios, Hafeez, Maryam, Elkael, Maxime, Bennis, Mehdi, Boudjelli, Mehdi, Dai, Meiling, Debbah, Merouane, Polese, Michele, Assaad, Mohamad, Benzaghta, Mohamed, Refai, Mohammad Al, Djerrab, Moussab, Syed, Mubeen, Amir, Muhammad, Yan, Na, Alkaabi, Najla, Li, Nan, Sehad, Nassim, Nikaein, Navid, Hashash, Omar, Sroka, Pawel, Yang, Qianqian, Zhao, Qiyang, Silab, Rasoul Nikbakht, Ying, Rex, Morabito, Roberto, Li, Rongpeng, Madi, Ryad, Ayoubi, Salah Eddine El, D'Oro, Salvatore, Lasaulce, Samson, Shalmashi, Serveh, Liu, Sige, Cherrared, Sihem, Chetty, Swarna Bindu, Dutta, Swastika, Zaidi, Syed A. R., Chen, Tianjiao, Murphy, Timothy, Melodia, Tommaso, Quek, Tony Q. S., Ram, Vishnu, Saad, Walid, Hamidouche, Wassim, Chen, Weilong, Liu, Xiaoou, Yu, Xiaoxue, Wang, Xijun, Shang, Xingyu, Wang, Xinquan, Cao, Xuelin, Su, Yang, Liang, Yanping, Deng, Yansha, Yang, Yifan, Cui, Yingping, Sun, Yu, Chen, Yuxuan, Pointurier, Yvan, Nehme, Zeinab, Nezami, Zeinab, Yang, Zhaohui, Zhang, Zhaoyang, Liu, Zhe, Yang, Zhenyu, Han, Zhu, Zhou, Zhuang, Chen, Zihan, Chen, Zirui, Shuai, Zitao
The rise of generative artificial intelligence (AI) as a novel frontier that uniquely merges advanced levels of intelligence with revolutionary user experiences is redefining the AI landscape for future cellular networks. In particular, the transition towards 6G systems has introduced a myriad of challenges inherent to their AI-native network design, requiring innovative solutions to enable real-time network orchestration, intelligent decision-making, and adaptive dynamic configurations. Meanwhile, the envisioned user experiences for 6G are growing increasingly complex, exceeding the capabilities offered by vintage wireless technologies and conventional AI solutions to satisfy their advanced demands. With its disruptive impact evident across diverse fields, generative AI possesses immense potential to tackle these challenges, leveraging its exceptional capabilities to manage complex tasks, operate autonomously, and adapt seamlessly to scenarios beyond its training domain. Remarkably, generative AI provides a transformative opportunity for telecom and cellular networks to bridge this defined gap in 6G systems, thereby shifting towards a new era with cutting-edge AI innovations across the different system and user levels.