Law
X-Guard: Multilingual Guard Agent for Content Moderation
Upadhayay, Bibek, Behzadan, Vahid, D, Ph.
Large Language Models (LLMs) have rapidly become integral to numerous applications in critical domains where reliability is paramount. Despite significant advances in safety frameworks and guardrails, current protective measures exhibit crucial vulnerabilities, particularly in multilingual contexts. Existing safety systems remain susceptible to adversarial attacks in low-resource languages and through code-switching techniques, primarily due to their English-centric design. Furthermore, the development of effective multilingual guardrails is constrained by the scarcity of diverse cross-lingual training data. Even recent solutions like Llama Guard-3, while offering multilingual support, lack transparency in their decision-making processes. We address these challenges by introducing X-Guard agent, a transparent multilingual safety agent designed to provide content moderation across diverse linguistic contexts. X-Guard effectively defends against both conventional low-resource language attacks and sophisticated code-switching attacks. Our approach includes: curating and enhancing multiple open-source safety datasets with explicit evaluation rationales; employing a jury of judges methodology to mitigate individual judge LLM provider biases; creating a comprehensive multilingual safety dataset spanning 132 languages with 5 million data points; and developing a two-stage architecture combining a custom-finetuned mBART-50 translation module with an evaluation X-Guard 3B model trained through supervised finetuning and GRPO training. Our empirical evaluations demonstrate X-Guard's effectiveness in detecting unsafe content across multiple languages while maintaining transparency throughout the safety evaluation process. Our work represents a significant advancement in creating robust, transparent, and linguistically inclusive safety systems for LLMs and its integrated systems.
Generative AI in Collaborative Academic Report Writing: Advantages, Disadvantages, and Ethical Considerations
Sadeghpour, Mahshid, Arakala, Arathi, Rao, Asha
The availability and abundance of GenAI tools to administer tasks traditionally managed by people have raised concerns, particularly within the education and academic sectors, as some students may highly rely on these tools to complete the assignments designed to enable learning. This article focuses on informing students about the significance of investing their time during their studies on developing essential life-long learning skills using their own critical thinking, rather than depending on AI models that are susceptible to misinformation, hallucination, and bias. As we transition to an AI-centric era, it is important to educate students on how these models work, their pitfalls, and the ethical concerns associated with feeding data to such tools. Keywords: GenAI in Academic Writing GenAI's Ethics GenAI's Privacy Concerns. 1 Introduction Writing academic reports, and papers have been instrumental to assisting students and researchers in shaping their ideas, organising their methods, and practicing their communication skills, particularly when this process is combined with receiving constant feedback from experts. With the launch of OpenAI's first publicly available Large Language Model, namely ChatGPT (GPT-3.5), a significant concern rose within the academic and research community about the reliability of the academic and research output. Evidence suggests that as individuals began discovering the availability and efficiency in using Generative Artificial Intelligence tools in late 2022, there was a significant surge in retracted research articles resulting in more than 10,000 retracted papers [1]. The over-reliance of individuals on various Generative Artificial Intelligence (Gen AI) tools for completing tasks that require a human's critical thinking has raised concerns.
CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization
Yao, Jing, Yi, Xiaoyuan, Wang, Jindong, Dou, Zhicheng, Xie, Xing
As Large Language Models (LLMs) more deeply integrate into human life across various regions, aligning them with pluralistic cultures is crucial for improving user experience and mitigating cultural conflicts. Existing approaches develop culturally aligned LLMs primarily through fine-tuning with massive carefully curated culture-specific corpora. Nevertheless, inspired by culture theories, we identify two key challenges faced by these datasets: (1) Representativeness: These corpora fail to fully capture the target culture's core characteristics with redundancy, causing computation waste; (2) Distinctiveness: They struggle to distinguish the unique nuances of a given culture from shared patterns across other relevant ones, hindering precise cultural modeling. To handle these challenges, we introduce CAReDiO, a novel cultural data construction framework. Specifically, CAReDiO utilizes powerful LLMs to automatically generate cultural conversation data, where both the queries and responses are further optimized by maximizing representativeness and distinctiveness. Using CAReDiO, we construct a small yet effective dataset, covering five cultures, and compare it with several recent cultural corpora. Extensive experiments demonstrate that our method generates more effective data and enables cultural alignment with as few as 100 training samples, enhancing both performance and efficiency.
SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models
Fang, Junfeng, Wang, Yukai, Wang, Ruipeng, Yao, Zijun, Wang, Kun, Zhang, An, Wang, Xiang, Chua, Tat-Seng
The rapid advancement of multi-modal large reasoning models (MLRMs) -- enhanced versions of multimodal language models (MLLMs) equipped with reasoning capabilities -- has revolutionized diverse applications. However, their safety implications remain underexplored. While prior work has exposed critical vulnerabilities in unimodal reasoning models, MLRMs introduce distinct risks from cross-modal reasoning pathways. This work presents the first systematic safety analysis of MLRMs through large-scale empirical studies comparing MLRMs with their base MLLMs. Our experiments reveal three critical findings: (1) The Reasoning Tax: Acquiring reasoning capabilities catastrophically degrades inherited safety alignment. MLRMs exhibit 37.44% higher jailbreaking success rates than base MLLMs under adversarial attacks. (2) Safety Blind Spots: While safety degradation is pervasive, certain scenarios (e.g., Illegal Activity) suffer 25 times higher attack rates -- far exceeding the average 3.4 times increase, revealing scenario-specific vulnerabilities with alarming cross-model and datasets consistency. (3) Emergent Self-Correction: Despite tight reasoning-answer safety coupling, MLRMs demonstrate nascent self-correction -- 16.9% of jailbroken reasoning steps are overridden by safe answers, hinting at intrinsic safeguards. These findings underscore the urgency of scenario-aware safety auditing and mechanisms to amplify MLRMs' self-correction potential. To catalyze research, we open-source OpenSafeMLRM, the first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods. Our work calls for immediate efforts to harden reasoning-augmented AI, ensuring its transformative potential aligns with ethical safeguards.
GIScience in the Era of Artificial Intelligence: A Research Agenda Towards Autonomous GIS
Li, Zhenlong, Ning, Huan, Gao, Song, Janowicz, Krzysztof, Li, Wenwen, Arundel, Samantha T., Yang, Chaowei, Bhaduri, Budhendra, Wang, Shaowen, Zhu, A-Xing, Gahegan, Mark, Shekhar, Shashi, Ye, Xinyue, McKenzie, Grant, Cervone, Guido, Hodgson, Michael E.
The advent of generative AI exemplified by large language models (LLMs) opens new ways to represent and compute geographic information and transcends the process of geographic knowledge production, driving geographic information systems (GIS) towards autonomous GIS. Leveraging LLMs as the decision core, autonomous GIS can independently generate and execute geoprocessing workflows to perform spatial analysis. In this vision paper, we further elaborate on the concept of autonomous GIS and present a conceptual framework that defines its five autonomous goals, five autonomous levels, five core functions, and three operational scales. We demonstrate how autonomous GIS could perform geospatial data retrieval, spatial analysis, and map making with four proof-of-concept GIS agents. We conclude by identifying critical challenges and future research directions, including fine-tuning and self-growing decision-cores, autonomous modeling, and examining the societal and practical implications of autonomous GIS. By establishing the groundwork for a paradigm shift in GIScience, this paper envisions a future where GIS moves beyond traditional workflows to autonomously reason, derive, innovate, and advance geospatial solutions to pressing global challenges. Meanwhile, as we design and deploy increasingly intelligent geospatial systems, we carry a responsibility to ensure they are developed in socially responsible ways, serve the public good, and support the continued value of human geographic insight in an AI-augmented future.
Interpretable and Fair Mechanisms for Abstaining Classifiers
Lenders, Daphne, Pugnana, Andrea, Pellungrini, Roberto, Calders, Toon, Pedreschi, Dino, Giannotti, Fosca
Abstaining classifiers have the option to refrain from providing a prediction for instances that are difficult to classify. The abstention mechanism is designed to trade off the classifier's performance on the accepted data while ensuring a minimum number of predictions. In this setting, often fairness concerns arise when the abstention mechanism solely reduces errors for the majority groups of the data, resulting in increased performance differences across demographic groups. While there exist a bunch of methods that aim to reduce discrimination when abstaining, there is no mechanism that can do so in an explainable way. In this paper, we fill this gap by introducing Interpretable and Fair Abstaining Classifier IFAC, an algorithm that can reject predictions both based on their uncertainty and their unfairness. By rejecting possibly unfair predictions, our method reduces error and positive decision rate differences across demographic groups of the non-rejected data. Since the unfairness-based rejections are based on an interpretable-by-design method, i.e., rule-based fairness checks and situation testing, we create a transparent process that can empower human decision-makers to review the unfair predictions and make more just decisions for them. This explainable aspect is especially important in light of recent AI regulations, mandating that any high-risk decision task should be overseen by human experts to reduce discrimination risks.
How to Survive the A.I. Revolution
In the early hours of April 12, 1812, a crowd of men approached Rawfolds Mill, a four-story stone building on the banks of the River Spen, in West Yorkshire. This was Brontë country--a landscape of bleak moors, steep valleys, and small towns nestled in the hollows. The men, who'd assembled on the moors hours earlier, were armed with muskets, sticks, hatchets, and heavy blacksmith's hammers. When they reached the mill, those at the front broke windows to gain entry, and some fired shots into the darkened factory. But the mill's owner, William Cartwright, had been preparing for trouble.
Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents
Buscemi, Alessio, Proverbio, Daniele, Bova, Paolo, Balabanova, Nataliya, Bashir, Adeela, Cimpeanu, Theodor, da Fonseca, Henrique Correia, Duong, Manh Hong, Domingos, Elias Fernandez, Fernandes, Antonio M., Krellner, Marcus, Ogbo, Ndidi Bianca, Powers, Simon T., Santos, Fernando P., Shamszaman, Zia Ush, Song, Zhao, Di Stefano, Alessandro, Han, The Anh
There is general agreement that fostering trust and cooperation within the AI development ecosystem is essential to promote the adoption of trustworthy AI systems. By embedding Large Language Model (LLM) agents within an evolutionary game-theoretic framework, this paper investigates the complex interplay between AI developers, regulators and users, modelling their strategic choices under different regulatory scenarios. Evolutionary game theory (EGT) is used to quantitatively model the dilemmas faced by each actor, and LLMs provide additional degrees of complexity and nuances and enable repeated games and incorporation of personality traits. Our research identifies emerging behaviours of strategic AI agents, which tend to adopt more "pessimistic" (not trusting and defective) stances than pure game-theoretic agents. We observe that, in case of full trust by users, incentives are effective to promote effective regulation; however, conditional trust may deteriorate the "social pact". Establishing a virtuous feedback between users' trust and regulators' reputation thus appears to be key to nudge developers towards creating safe AI. However, the level at which this trust emerges may depend on the specific LLM used for testing. Our results thus provide guidance for AI regulation systems, and help predict the outcome of strategic LLM agents, should they be used to aid regulation itself.
A Survey of Machine Learning Models and Datasets for the Multi-label Classification of Textual Hate Speech in English
Bäumler, Julian, Blöcher, Louis, Frey, Lars-Joel, Chen, Xian, Bayer, Markus, Reuter, Christian
The dissemination of online hate speech can have serious negative consequences for individuals, online communities, and entire societies. This and the large volume of hateful online content prompted both practitioners', i.e., in content moderation or law enforcement, and researchers' interest in machine learning models to automatically classify instances of hate speech. Whereas most scientific works address hate speech classification as a binary task, practice often requires a differentiation into sub-types, e.g., according to target, severity, or legality, which may overlap for individual content. Hence, researchers created datasets and machine learning models that approach hate speech classification in textual data as a multi-label problem. This work presents the first systematic and comprehensive survey of scientific literature on this emerging research landscape in English (N=46). We contribute with a concise overview of 28 datasets suited for training multi-label classification models that reveals significant heterogeneity regarding label-set, size, meta-concept, annotation process, and inter-annotator agreement. Our analysis of 24 publications proposing suitable classification models further establishes inconsistency in evaluation and a preference for architectures based on Bidirectional Encoder Representation from Transformers (BERT) and Recurrent Neural Networks (RNNs). We identify imbalanced training data, reliance on crowdsourcing platforms, small and sparse datasets, and missing methodological alignment as critical open issues and formulate ten recommendations for research.
seeBias: A Comprehensive Tool for Assessing and Visualizing AI Fairness
Ning, Yilin, Ma, Yian, Liu, Mingxuan, Li, Xin, Liu, Nan
Fairness in artificial intelligence (AI) prediction models is increasingly emphasized to support responsible adoption in high-stakes domains such as health care and criminal justice. Guidelines and implementation frameworks highlight the importance of both predictive accuracy and equitable outcomes. However, current fairness toolkits often evaluate classification performance disparities in isolation, with limited attention to other critical aspects such as calibration. To address these gaps, we present seeBias, an R package for comprehensive evaluation of model fairness and predictive performance. seeBias offers an integrated evaluation across classification, calibration, and other performance domains, providing a more complete view of model behavior. It includes customizable visualizations to support transparent reporting and responsible AI implementation. Using public datasets from criminal justice and healthcare, we demonstrate how seeBias supports fairness evaluations, and uncovers disparities that conventional fairness metrics may overlook. The R package is available on GitHub, and a Python version is under development.