Law
Improve Rule Retrieval and Reasoning with Self-Induction and Relevance ReEstimate
Huang, Ziyang, Sun, Wangtao, Zhao, Jun, Liu, Kang
This paper systematically addresses the challenges of rule retrieval, a crucial yet underexplored area. Vanilla retrieval methods using sparse or dense retrievers to directly search for relevant rules to support downstream reasoning, often suffer from low accuracy. This is primarily due to a significant semantic gap between the instantiated facts in the queries and the abstract representations of the rules. Such misalignment results in suboptimal retrieval quality, which in turn negatively impacts reasoning performance. To overcome these challenges, we propose Self-Induction Augmented Retrieval (SIAR), a novel approach that utilizes Large Language Models (LLMs) to induce potential inferential rules that might offer benefits for reasoning by abstracting the underlying knowledge and logical structure in queries. These induced rules are then used for query augmentation to improve retrieval effectiveness. Additionally, we introduce Rule Relevance ReEstimate (R$^3$), a method that re-estimates the relevance of retrieved rules by assessing whether the abstract knowledge they contain can be instantiated to align with the facts in the queries and the helpfulness for reasoning. Extensive experiments across various settings demonstrate the effectiveness and versatility of our proposed methods.
LLMs unlock new paths to monetizing exploits
Carlini, Nicholas, Nasr, Milad, Debenedetti, Edoardo, Wang, Barry, Choquette-Choo, Christopher A., Ippolito, Daphne, Tramรจr, Florian, Jagielski, Matthew
We argue that Large language models (LLMs) will soon alter the economics of cyberattacks. Instead of attacking the most commonly used software and monetizing exploits by targeting the lowest common denominator among victims, LLMs enable adversaries to launch tailored attacks on a user-by-user basis. On the exploitation front, instead of human attackers manually searching for one difficult-to-identify bug in a product with millions of users, LLMs can find thousands of easy-to-identify bugs in products with thousands of users. And on the monetization front, instead of generic ransomware that always performs the same attack (encrypt all your data and request payment to decrypt), an LLM-driven ransomware attack could tailor the ransom demand based on the particular content of each exploited device. We show that these two attacks (and several others) are imminently practical using state-of-the-art LLMs. For example, we show that without any human intervention, an LLM finds highly sensitive personal information in the Enron email dataset (e.g., an executive having an affair with another employee) that could be used for blackmail. While some of our attacks are still too expensive to scale widely today, the incentives to implement these attacks will only increase as LLMs get cheaper. Thus, we argue that LLMs create a need for new defense-in-depth approaches.
CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs
Chen, Sijia, Li, Xiaomin, Zhang, Mengxue, Jiang, Eric Hanchen, Zeng, Qingcheng, Yu, Chen-Hsiang
Large language models (LLMs) are increasingly deployed in medical contexts, raising critical concerns about safety, alignment, and susceptibility to adversarial manipulation. While prior benchmarks assess model refusal capabilities for harmful prompts, they often lack clinical specificity, graded harmfulness levels, and coverage of jailbreak-style attacks. We introduce CARES (Clinical Adversarial Robustness and Evaluation of Safety), a benchmark for evaluating LLM safety in healthcare. CARES includes over 18,000 prompts spanning eight medical safety principles, four harm levels, and four prompting styles: direct, indirect, obfuscated, and role-play, to simulate both malicious and benign use cases. We propose a three-way response evaluation protocol (Accept, Caution, Refuse) and a fine-grained Safety Score metric to assess model behavior. Our analysis reveals that many state-of-the-art LLMs remain vulnerable to jailbreaks that subtly rephrase harmful prompts, while also over-refusing safe but atypically phrased queries. Finally, we propose a mitigation strategy using a lightweight classifier to detect jailbreak attempts and steer models toward safer behavior via reminder-based conditioning. CARES provides a rigorous framework for testing and improving medical LLM safety under adversarial and ambiguous conditions.
Ready2Unlearn: A Learning-Time Approach for Preparing Models with Future Unlearning Readiness
Duan, Hanyu, Yang, Yi, Abbasi, Ahmed, Tam, Kar Yan
This paper introduces Ready2Unlearn, a learning-time optimization approach designed to facilitate future unlearning processes. Unlike the majority of existing unlearning efforts that focus on designing unlearning algorithms, which are typically implemented reactively when an unlearning request is made during the model deployment phase, Ready2Unlearn shifts the focus to the training phase, adopting a "forward-looking" perspective. Building upon well-established meta-learning principles, Ready2Unlearn proactively trains machine learning models with unlearning readiness, such that they are well prepared and can handle future unlearning requests in a more efficient and principled manner. Ready2Unlearn is model-agnostic and compatible with any gradient ascent-based machine unlearning algorithms. We evaluate the method on both vision and language tasks under various unlearning settings, including class-wise unlearning and random data unlearning. Experimental results show that by incorporating such preparedness at training time, Ready2Unlearn produces an unlearning-ready model state, which offers several key advantages when future unlearning is required, including reduced unlearning time, improved retention of overall model capability, and enhanced resistance to the inadvertent recovery of forgotten data. We hope this work could inspire future efforts to explore more proactive strategies for equipping machine learning models with built-in readiness towards more reliable and principled machine unlearning.
SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval
Peng, Qiwei, Moro, Robert, Gregor, Michal, Srba, Ivan, Ostermann, Simon, Simko, Marian, Podrouลพek, Juraj, Mesarฤรญk, Matรบลก, Kopฤan, Jaroslav, Sรธgaard, Anders
The rapid spread of online disinformation presents a global challenge, and machine learning has been widely explored as a potential solution. However, multilingual settings and low-resource languages are often neglected in this field. To address this gap, we conducted a shared task on multilingual claim retrieval at SemEval 2025, aimed at identifying fact-checked claims that match newly encountered claims expressed in social media posts across different languages. The task includes two subtracks: (1) a monolingual track, where social posts and claims are in the same language, and (2) a crosslingual track, where social posts and claims might be in different languages. A total of 179 participants registered for the task contributing to 52 test submissions. 23 out of 31 teams have submitted their system papers. In this paper, we report the best-performing systems as well as the most common and the most effective approaches across both subtracks. This shared task, along with its dataset and participating systems, provides valuable insights into multilingual claim retrieval and automated fact-checking, supporting future research in this field.
Comparing LLM Text Annotation Skills: A Study on Human Rights Violations in Social Media Data
Nemkova, Poli Apollinaire, Ubani, Solomon, Albert, Mark V.
In the era of increasingly sophisticated natural language processing (NLP) systems, large language models (LLMs) have demonstrated remarkable potential for diverse applications, including tasks requiring nuanced textual understanding and contextual reasoning. This study investigates the capabilities of multiple state-of-the-art LLMs - GPT-3.5, GPT-4, LLAMA3, Mistral 7B, and Claude-2 - for zero-shot and few-shot annotation of a complex textual dataset comprising social media posts in Russian and Ukrainian. Specifically, the focus is on the binary classification task of identifying references to human rights violations within the dataset. To evaluate the effectiveness of these models, their annotations are compared against a gold standard set of human double-annotated labels across 1000 samples. The analysis includes assessing annotation performance under different prompting conditions, with prompts provided in both English and Russian. Additionally, the study explores the unique patterns of errors and disagreements exhibited by each model, offering insights into their strengths, limitations, and cross-linguistic adaptability. By juxtaposing LLM outputs with human annotations, this research contributes to understanding the reliability and applicability of LLMs for sensitive, domain-specific tasks in multilingual contexts. It also sheds light on how language models handle inherently subjective and context-dependent judgments, a critical consideration for their deployment in real-world scenarios.
Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning
Ishibashi, Yoichi, Yano, Taro, Oyamada, Masafumi
Large Language Models (LLMs) have demonstrated significant improvements in reasoning capabilities through supervised fine-tuning and reinforcement learning. However, when training reasoning models, these approaches are primarily applicable to specific domains such as mathematics and programming, which imposes fundamental constraints on the breadth and scalability of training data. In contrast, continual pretraining (CPT) offers the advantage of not requiring task-specific signals. Nevertheless, how to effectively synthesize training data for reasoning and how such data affect a wide range of domains remain largely unexplored. This study provides a detailed evaluation of Reasoning CPT, a form of CPT that uses synthetic data to reconstruct the hidden thought processes underlying texts, based on the premise that texts are the result of the author's thinking process. Specifically, we apply Reasoning CPT to Gemma2-9B using synthetic data with hidden thoughts derived from STEM and Law corpora, and compare it to standard CPT on the MMLU benchmark. Our analysis reveals that Reasoning CPT consistently improves performance across all evaluated domains. Notably, reasoning skills acquired in one domain transfer effectively to others; the performance gap with conventional methods widens as problem difficulty increases, with gains of up to 8 points on the most challenging problems. Furthermore, models trained with hidden thoughts learn to adjust the depth of their reasoning according to problem difficulty.
Healthy Distrust in AI systems
Paaรen, Benjamin, Alpsancar, Suzana, Matzner, Tobias, Scharlau, Ingrid
Under the slogan of trustworthy AI, much of contemporary AI research is focused on designing AI systems and usage practices that inspire human trust and, thus, enhance adoption of AI systems. However, a person affected by an AI system may not be convinced by AI system design alone -- neither should they, if the AI system is embedded in a social context that gives good reason to believe that it is used in tension with a person's interest. In such cases, distrust in the system may be justified and necessary to build meaningful trust in the first place. We propose the term "healthy distrust" to describe such a justified, careful stance towards certain AI usage practices. We investigate prior notions of trust and distrust in computer science, sociology, history, psychology, and philosophy, outline a remaining gap that healthy distrust might fill and conceptualize healthy distrust as a crucial part for AI usage that respects human autonomy.
Detecting Musical Deepfakes
Ab s tract -- The proliferation of Text - to - Music (TTM) platforms has democratized music creation, letting users effortlessly generat e high - quality compositions . However, this innovation has also introduced challenges to musicians and the music in dustry . T his research focuses on utilizing the FakeMusicCaps dataset to address the challenge of detecting AI - generated songs by classifying the audio as deepfake or human. To simulate a real - world adversarial entity tempo stretching and pitch shifting modifications were applied to the dataset . Mel Spectrograms were generated from the resulting datasets, w hich were then used to train and test a convolutional neural network. This paper also explores the ethical and societal implications of TTM platforms, suggesting that detection systems developed and employed with care are a necessary tool to safeguard musicians and foster the positive potential of TTM plat forms and gen erative AI in music . Rapid a dvances in g e nerative AI have caused the creat ive landscape to be u pended, enabling almost anyone to easily create music that can be hard to distinguish from human - ma de compositions . AI - generated music is part of a wider classification of AI - generated media and art that falls unde r the category of " deepfake " .
Super Speeders are deadly. This technology can slow them down.
Breakthroughs, discoveries, and DIY tips sent every weekday. In 2013, Amy Cohen experienced the unthinkable for a parent. It was a mild October day in New York City and her 12-year-old son Sammy stopped by the house to grab a snack on his way from school to soccer practice. When he stepped out onto their street in Brooklyn, Sammy was struck and killed by a speeding van. "It's a horror no parent should ever experience," Cohen told Popular Science.