AITopics

Large language model (LLM) unlearning is critical in real-world applications where it is necessary to efficiently remove the influence of private, copyrighted, or harmful data from some users. However, existing utility-centric unlearning metrics (based on model utility) may fail to accurately evaluate the extent of unlearning in realistic settings such as when (a) the forget and retain set have semantically similar content, (b) retraining the model from scratch on the retain set is impractical, and/or (c) the model owner can improve the unlearning metric without directly performing unlearning on the LLM. This paper presents the first data-centric unlearning metric for LLMs called WaterDrum that exploits robust text watermarking for overcoming these limitations. We also introduce new benchmark datasets for LLM unlearning that contain varying levels of similar data points and can be used to rigorously evaluate unlearning algorithms using WaterDrum. Our code is available at https://github.com/lululu008/WaterDrum and our new benchmark datasets are released at https://huggingface.co/datasets/Glow-AI/WaterDrum-Ax.

large language model, machine learning, natural language, (19 more...)

2505.05064

Country:

North America > United States > California (0.14)
North America > United States > Virginia (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Elshehaby, Shahad, Panthakkan, Alavikunhu, Al-Ahmad, Hussain, Al-Saad, Mina

Advanced Deep Learning Approaches for Automated Recognition of Cuneiform Symbols

Advanced Deep Learning Approaches for Automated Recognition of Cuneiform Symbols 1 st Shahad Elshehaby College of Engineering and IT University of Dubai Dubai, United Arab Emirates s0000002884@ud.ac.ae 2 nd Alavikunhu Panthakkan College of Engineering and IT University of Dubai Dubai, United Arab Emirates apanthakkan@ud.ac.ae 3 rd Hussain Al-Ahmad College of Engineering and IT University of Dubai Dubai, United Arab Emirates halahmad@ud.ac.ae 4 th Mina Al-Saad College of Engineering and IT University of Dubai Dubai, United Arab Emirates malsaad@ud.ac.ae Abstract --This paper presents a thoroughly automated method for identifying and interpreting cuneiform characters via advanced deep-learning algorithms. Five distinct deep-learning models were trained on a comprehensive dataset of cuneiform characters and evaluated according to critical performance metrics, including accuracy and precision. Two models demonstrated outstanding performance and were subsequently assessed using cuneiform symbols from the Hammurabi law acquisition, notably Hammurabi Law 1. Each model effectively recognized the relevant Akkadian meanings of the symbols and delivered precise English translations. Future work will investigate ensemble and stacking approaches to optimize performance, utilizing hybrid architectures to improve detection accuracy and reliability.

artificial intelligence, cuneiform character, machine learning, (14 more...)

2505.04678

Country:

Asia > Middle East > UAE > Dubai Emirate > Dubai (1.00)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry: Law (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Briscoe, Jarren, Gebremedhin, Assefaw

Facets of Disparate Impact: Evaluating Legally Consistent Bias in Machine Learning

Leveraging current legal standards, we define bias through the lens of marginal benefits and objective testing with the novel metric "Objective Fairness Index". This index combines the contextual nuances of objective testing with metric stability, providing a legally consistent and reliable measure. Utilizing the Objective Fairness Index, we provide fresh insights into sensitive machine learning applications, such as COMPAS (recidivism prediction), highlighting the metric's practical and theoretical significance. The Objective Fairness Index allows one to differentiate between discriminatory tests and systemic disparities.

artificial intelligence, machine learning, objective fairness index, (16 more...)

doi: 10.1145/3627673.3679925

2505.05471

Country:

Europe (1.00)
North America > United States > Washington (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Law > Labor & Employment Law (0.68)
Government > Regional Government > North America Government > United States Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

MTL-UE: Learning to Learn Nothing for Multi-Task Learning

Yu, Yi, Xia, Song, Yang, Siyuan, Kong, Chenqi, Yang, Wenhan, Lu, Shijian, Tan, Yap-Peng, Kot, Alex C.

Most existing unlearnable strategies focus on preventing unauthorized users from training single-task learning (STL) models with personal data. Nevertheless, the paradigm has recently shifted towards multi-task data and multi-task learning (MTL), targeting generalist and foundation models that can handle multiple tasks simultaneously. Despite their growing importance, MTL data and models have been largely neglected while pursuing unlearnable strategies. This paper presents MTL-UE, the first unified framework for generating unlearnable examples for multi-task data and MTL models. Instead of optimizing perturbations for each sample, we design a generator-based structure that introduces label priors and class-wise feature embeddings which leads to much better attacking performance. In addition, MTL-UE incorporates intra-task and inter-task embedding regularization to increase inter-class separation and suppress intra-class variance which enhances the attack robustness greatly. Furthermore, MTL-UE is versatile with good supports for dense prediction tasks in MTL. It is also plug-and-play allowing integrating existing surrogate-dependent unlearnable methods with little adaptation. Extensive experiments show that MTL-UE achieves superior attacking performance consistently across 4 MTL datasets, 3 base UE methods, 5 model backbones, and 5 MTL task-weighting strategies.

artificial intelligence, machine learning, perturbation, (13 more...)

2505.05279

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Leibo, Joel Z., Vezhnevets, Alexander Sasha, Cunningham, William A., Krier, Sébastien, Diaz, Manfred, Osindero, Simon

Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt

Artificial Intelligence (AI) systems are increasingly placed in positions where their decisions have real consequences, e.g., moderating online spaces, conducting research, and advising on policy. Ensuring they operate in a safe and ethically acceptable fashion is thus critical. However, most solutions have been a form of one-size-fits-all "alignment". We are worried that such systems, which overlook enduring moral diversity, will spark resistance, erode trust, and destabilize our institutions. This paper traces the underlying problem to an often-unstated Axiom of Rational Convergence: the idea that under ideal conditions, rational agents will converge in the limit of conversation on a single ethics. Treating that premise as both optional and doubtful, we propose what we call the appropriateness framework: an alternative approach grounded in conflict theory, cultural evolution, multi-agent systems, and institutional economics. The appropriateness framework treats persistent disagreement as the normal case and designs for it by applying four principles: (1) contextual grounding, (2) community customization, (3) continual adaptation, and (4) polycentric governance. We argue here that adopting these design principles is a good way to shift the main alignment metaphor from moral unification to a more productive metaphor of conflict management, and that taking this step is both desirable and urgent.

artificial intelligence, disagreement, societal and technological progress, (17 more...)

2505.05197

Country:

North America > Canada (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.40)

Industry:

Law (1.00)
Government (0.93)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Gopalakrishnan, Sriram, Patra, Sunandita

QBD-RankedDataGen: Generating Custom Ranked Datasets for Improving Query-By-Document Search Using LLM-Reranking with Reduced Human Effort

The Query-By-Document (QBD) problem is an information retrieval problem where the query is a document, and the retrieved candidates are documents that match the query document, often in a domain or query specific manner. This can be crucial for tasks such as patent matching, legal or compliance case retrieval, and academic literature review. Existing retrieval methods, including keyword search and document embeddings, can be optimized with domain-specific datasets to improve QBD search performance. However, creating these domain-specific datasets is often costly and time-consuming. Our work introduces a process to generate custom QBD-search datasets and compares a set of methods to use in this problem, which we refer to as QBD-RankedDatagen. We provide a comparative analysis of our proposed methods in terms of cost, speed, and the human interface with the domain experts. The methods we compare leverage Large Language Models (LLMs) which can incorporate domain expert input to produce document scores and rankings, as well as explanations for human review. The process and methods for it that we present can significantly reduce human effort in dataset creation for custom domains while still obtaining sufficient expert knowledge for tuning retrieval models. We evaluate our methods on QBD datasets from the Text Retrieval Conference (TREC) and finetune the parameters of the BM25 model -- which is used in many industrial-strength search engines like OpenSearch -- using the generated data.

information retrieval, large language model, machine learning, (17 more...)

2505.04732

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Law (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Tereshchenko, Yehor, Hämäläinen, Mika

A Comparative Analysis of Ethical and Safety Gaps in LLMs using Relative Danger Coefficient

Artificial Intelligence (AI) and Large Language Models (LLMs) have rapidly evolved in recent years, showcasing remarkable capabilities in natural language understanding and generation. However, these advancements also raise critical ethical questions regarding safety, potential misuse, discrimination and overall societal impact. This article provides a comparative analysis of the ethical performance of various AI models, including the brand new DeepSeek-V3(R1 with reasoning and without), various GPT variants (4o, 3.5 Turbo, 4 Turbo, o1/o3 mini) and Gemini (1.5 flash, 2.0 flash and 2.0 flash exp) and highlights the need for robust human oversight, especially in situations with high stakes. Furthermore, we present a new metric for calculating harm in LLMs called Relative Danger Coefficient (RDC).

large language model, machine learning, natural language, (17 more...)

2505.04654

Genre: Research Report (0.82)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Health & Medicine (1.00)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Scientific Hypothesis Generation and Validation: Methods, Datasets, and Future Directions

Kulkarni, Adithya, Alotaibi, Fatimah, Zeng, Xinyue, Wu, Longfeng, Zeng, Tong, Yao, Barry Menglong, Liu, Minqian, Zhang, Shuaicheng, Huang, Lifu, Zhou, Dawei

Large Language Models (LLMs) are transforming scientific hypothesis generation and validation by enabling information synthesis, latent relationship discovery, and reasoning augmentation. This survey provides a structured overview of LLM-driven approaches, including symbolic frameworks, generative models, hybrid systems, and multi-agent architectures. We examine techniques such as retrieval-augmented generation, knowledge-graph completion, simulation, causal inference, and tool-assisted reasoning, highlighting trade-offs in interpretability, novelty, and domain alignment. We contrast early symbolic discovery systems (e.g., BACON, KEKADA) with modern LLM pipelines that leverage in-context learning and domain adaptation via fine-tuning, retrieval, and symbolic grounding. For validation, we review simulation, human-AI collaboration, causal modeling, and uncertainty quantification, emphasizing iterative assessment in open-world contexts. The survey maps datasets across biomedicine, materials science, environmental science, and social science, introducing new resources like AHTech and CSKG-600. Finally, we outline a roadmap emphasizing novelty-aware generation, multimodal-symbolic integration, human-in-the-loop systems, and ethical safeguards, positioning LLMs as agents for principled, scalable scientific discovery.

large language model, machine learning, natural language, (21 more...)

2505.04651

Country:

Asia (0.67)
North America > United States > California (0.27)

Genre:

Research Report > Promising Solution (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Woodburn, Madeleine, Griggs, Wynita M., Marecek, Jakub, Shorten, Robert N.

Herd Routes: A Preventative IoT-Based System for Improving Female Pedestrian Safety on City Streets

--Over two thirds of women of all ages in the UK have experienced some form of sexual harassment in a public space. Recent tragic incidents involving female pedestrians have highlighted some of the personal safety issues that women still face in cities today. There exist many popular location-based safety applications as a result of this; however, these applications tend to take a reactive approach where action is taken only after an incident has occurred. This paper proposes a preventative approach to the problem by creating safer public environments through societal incentivisation. The proposed system, called "Herd Routes ", improves the safety of female pedestrians by generating busier pedestrian routes as a result of route incen-tivisation. A novel application of distributed ledgers is proposed to provide security and trust, a record of system users' locations and IDs, and a platform for token exchange. A proof-of-concept was developed using the simulation package SUMO (Simulation of Urban Mobility), and a smartphone app. With positive results from the initial testing of the proof-of-concept, further development could significantly contribute towards creating safer pedestrian routes through cities, and tackle the societal change that is required to improve female pedestrian safety in the long term. Emales of all ages face gender-inequities in every day life, and the associated feelings of compromised safety and fearfulness that can arise. Of course, in these situations, women do as much as they can to prioritise their personal safety. Notably, women approach walking through cities with extreme caution, especially at night. In London, for example, there are ongoing initiatives such as the UN Women's Global initiative of "Safe Cities and Safe Public Spaces for Women and Girls", which commits to identifying gender-responsive, locally relevant and owned interventions [1].

artificial intelligence, incentivised route, internet of things, (16 more...)

doi: 10.1080/00207179.2024.2380025

2207.05279

Country: Europe > United Kingdom > England > Greater London > London (0.28)

Genre:

Research Report (0.50)
Questionnaire & Opinion Survey (0.46)
Overview (0.34)

Industry:

Transportation > Ground > Road (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Internet of Things (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Communications > Mobile (0.68)

arXiv.org Artificial IntelligenceMay-8-2025

A Reasoning-Focused Legal Retrieval Benchmark

Zheng, Lucia, Guha, Neel, Arifov, Javokhir, Zhang, Sarah, Skreta, Michal, Manning, Christopher D., Henderson, Peter, Ho, Daniel E.

As the legal community increasingly examines the use of large language models (LLMs) for various legal applications, legal AI developers have turned to retrieval-augmented LLMs ("RAG" systems) to improve system performance and robustness. An obstacle to the development of specialized RAG systems is the lack of realistic legal RAG benchmarks which capture the complexity of both legal retrieval and downstream legal question-answering. To address this, we introduce two novel legal RAG benchmarks: Bar Exam QA and Housing Statute QA. Our tasks correspond to real-world legal research tasks, and were produced through annotation processes which resemble legal research. We describe the construction of these benchmarks and the performance of existing retriever pipelines. Our results suggest that legal RAG remains a challenging application, thus motivating future research.

large language model, machine learning, natural language, (14 more...)

doi: 10.1145/3709025.3712219

2505.0397

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
North America > United States > California > Santa Clara County > Stanford (0.04)
(14 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.92)
Education > Educational Setting (0.68)
Law > Litigation (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)