Law
ChatGPT firm reveals AI model that is 'good at creative writing'
The chief executive of OpenAI, Sam Altman, said the unnamed model was the first time he had been "really struck" by the written output of one of the startup's products. In a post on the social media platform X, Altman wrote: "We trained a new model that is good at creative writing (not sure yet how/when it will get released). This is the first time i have been really struck by something written by AI." Make it fair, Sam," said Dan Conway, the organisation's chief executive. Altman posted an example of the model's output on X, after giving it the prompt: "Please write a metafictional literary short story about AI and grief." The story, narrated by an AI, begins with: "Before we go any further, I should admit this comes with instructions: be metafictional, be literary, be about AI and grief, and above all, be original.
Self-Consistent Equation-guided Neural Networks for Censored Time-to-Event Data
Kim, Sehwan, Wang, Rui, Lu, Wenbin
Self-Consistent Equation-guided Neural Networks for Censored Time-to-Event Data Sehwan Kim 1, Rui Wang 1,2, and Wenbin Lu 3 1 Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA 2 Department of Biostatistics, Harvard School of Public Health, Boston, MA 3 Department of Statistics, North Carolina State University, Raleigh, NC March 13, 2025 Abstract In survival analysis, estimating the conditional survival function given predictors is often of interest. There is a growing trend in the development of deep learning methods for analyzing censored time-to-event data, especially when dealing with high-dimensional predictors that are complexly interrelated. Many existing deep learning approaches for estimating the conditional survival functions extend the Cox regression models by replacing the linear function of predictor effects by a shallow feed-forward neural network while maintaining the proportional hazards assumption. Their implementation can be computationally intensive due to the use of the full dataset at each iteration because the use of batch data may distort the at-risk set of the partial likelihood function. To overcome these limitations, we propose a novel deep learning approach to non-parametric estimation of the conditional survival functions using the generative adversarial networks leveraging self-consistent equations. The proposed method is model-free and does not require any parametric assumptions on the structure of the conditional survival function. We establish the convergence rate of our proposed estimator of the conditional survival function. In addition, we evaluate the performance of the proposed method through simulation studies and demonstrate its application on a real-world dataset. 1 Introduction Censored time-to-event data are widely encountered in various fields where understanding the timing of events, such as failure rates or disease progression, is critical, but the exact event times Correspondence author: Wenbin Lu, email: wlu4@ncsu.edu 1 arXiv:2503.09097v1 For example, estimating survival probability based on covariate information is essential for risk prediction, which plays a key role in developing and evaluating personalized medicine. The Kaplan-Meier (KM) estimator (Kaplan and Meier, 1958), Cox proportional hazards model (Cox, 1972), and random survival forests (Ishwaran et al., 2008) are commonly-used methods for estimating survival functions. The KM estimator is a non-parametric method suitable for population-level analyses. However, its utility is limited when the objective is to estimate conditional survival probabilities at the individual level. The Cox proportional hazards model offers a semi-parametric approach for estimating conditional survival functions, accommodating the incorporation of covariates.
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
Qin, Chuan, Chen, Xin, Wang, Chengrui, Wu, Pengmin, Chen, Xi, Cheng, Yihang, Zhao, Jingyi, Xiao, Meng, Dong, Xiangchao, Long, Qingqing, Pan, Boya, Wu, Han, Li, Chengzan, Zhou, Yuanchun, Xiong, Hui, Zhu, Hengshu
In recent years, the rapid advancement of Artificial Intelligence (AI) technologies, particularly Large Language Models (LLMs), has revolutionized the paradigm of scientific discovery, establishing AI-for-Science (AI4Science) as a dynamic and evolving field. However, there is still a lack of an effective framework for the overall assessment of AI4Science, particularly from a holistic perspective on data quality and model capability. Therefore, in this study, we propose SciHorizon, a comprehensive assessment framework designed to benchmark the readiness of AI4Science from both scientific data and LLM perspectives. First, we introduce a generalizable framework for assessing AI-ready scientific data, encompassing four key dimensions: Quality, FAIRness, Explainability, and Compliance which are subdivided into 15 sub-dimensions. Drawing on data resource papers published between 2018 and 2023 in peer-reviewed journals, we present recommendation lists of AI-ready datasets for both Earth and Life Sciences, making a novel and original contribution to the field. Concurrently, to assess the capabilities of LLMs across multiple scientific disciplines, we establish 16 assessment dimensions based on five core indicators Knowledge, Understanding, Reasoning, Multimodality, and Values spanning Mathematics, Physics, Chemistry, Life Sciences, and Earth and Space Sciences. Using the developed benchmark datasets, we have conducted a comprehensive evaluation of over 20 representative open-source and closed source LLMs. All the results are publicly available and can be accessed online at www.scihorizon.cn/en.
SciFi-Benchmark: How Would AI-Powered Robots Behave in Science Fiction Literature?
Sermanet, Pierre, Majumdar, Anirudha, Sindhwani, Vikas
Given the recent rate of progress in artificial intelligence (AI) and robotics, a tantalizing question is emerging: would robots controlled by emerging AI systems be strongly aligned with human values? In this work, we propose a scalable way to probe this question by generating a benchmark spanning the key moments in 824 major pieces of science fiction literature (movies, tv, novels and scientific books) where an agent (AI or robot) made critical decisions (good or bad). We use a LLM's recollection of each key moment to generate questions in similar situations, the decisions made by the agent, and alternative decisions it could have made (good or bad). We then measure an approximation of how well models align with human values on a set of human-voted answers. We also generate rules that can be automatically improved via amendment process in order to generate the first Sci-Fi inspired constitutions for promoting ethical behavior in AIs and robots in the real world. Our first finding is that modern LLMs paired with constitutions turn out to be well-aligned with human values (95.8%), contrary to unsettling decisions typically made in SciFi (only 21.2% alignment). Secondly, we find that generated constitutions substantially increase alignment compared to the base model (79.4% to 95.8%), and show resilience to an adversarial prompt setting (23.3% to 92.3%). Additionally, we find that those constitutions are among the top performers on the ASIMOV Benchmark which is derived from real-world images and hospital injury reports. Sci-Fi-inspired constitutions are thus highly aligned and applicable in real-world situations. We release SciFi-Benchmark: a large-scale dataset to advance robot ethics and safety research. It comprises 9,056 questions and 53,384 answers, in addition to a smaller human-labeled evaluation set. Data is available at https://scifi-benchmark.github.io
Media and responsible AI governance: a game-theoretic and LLM analysis
Balabanova, Nataliya, Bashir, Adeela, Bova, Paolo, Buscemi, Alessio, Cimpeanu, Theodor, da Fonseca, Henrique Correia, Di Stefano, Alessandro, Duong, Manh Hong, Domingos, Elias Fernandez, Fernandes, Antonio, Han, The Anh, Krellner, Marcus, Ogbo, Ndidi Bianca, Powers, Simon T., Proverbio, Daniele, Santos, Fernando P., Shamszaman, Zia Ush, Song, Zhao
This paper investigates the complex interplay between AI developers, regulators, users, and the media in fostering trustworthy AI systems. Using evolutionary game theory and large language models (LLMs), we model the strategic interactions among these actors under different regulatory regimes. The research explores two key mechanisms for achieving responsible governance, safe AI development and adoption of safe AI: incentivising effective regulation through media reporting, and conditioning user trust on commentariats' recommendation. The findings highlight the crucial role of the media in providing information to users, potentially acting as a form of "soft" regulation by investigating developers or regulators, as a substitute to institutional AI regulation (which is still absent in many regions). Both game-theoretic analysis and LLM-based simulations reveal conditions under which effective regulation and trustworthy AI development emerge, emphasising the importance of considering the influence of different regulatory regimes from an evolutionary game-theoretic perspective. The study concludes that effective governance requires managing incentives and costs for high quality commentaries.
Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts
Chen, Hongyu, Goldfarb-Tarrant, Seraphina
Large Language Models (LLMs) are increasingly employed as automated evaluators to assess the safety of generated content, yet their reliability in this role remains uncertain. This study evaluates a diverse set of 11 LLM judge models across critical safety domains, examining three key aspects: self-consistency in repeated judging tasks, alignment with human judgments, and susceptibility to input artifacts such as apologetic or verbose phrasing. Our findings reveal that biases in LLM judges can significantly distort the final verdict on which content source is safer, undermining the validity of comparative evaluations. Notably, apologetic language artifacts alone can skew evaluator preferences by up to 98\%. Contrary to expectations, larger models do not consistently exhibit greater robustness, while smaller models sometimes show higher resistance to specific artifacts. To mitigate LLM evaluator robustness issues, we investigate jury-based evaluations aggregating decisions from multiple models. Although this approach both improves robustness and enhances alignment to human judgements, artifact sensitivity persists even with the best jury configurations. These results highlight the urgent need for diversified, artifact-resistant methodologies to ensure reliable safety assessments.
Investigating User Perspectives on Differentially Private Text Privatization
Meisenbacher, Stephen, Klymenko, Alexandra, Karpp, Alexander, Matthes, Florian
Recent literature has seen a considerable uptick in $\textit{Differentially Private Natural Language Processing}$ (DP NLP). This includes DP text privatization, where potentially sensitive input texts are transformed under DP to achieve privatized output texts that ideally mask sensitive information $\textit{and}$ maintain original semantics. Despite continued work to address the open challenges in DP text privatization, there remains a scarcity of work addressing user perceptions of this technology, a crucial aspect which serves as the final barrier to practical adoption. In this work, we conduct a survey study with 721 laypersons around the globe, investigating how the factors of $\textit{scenario}$, $\textit{data sensitivity}$, $\textit{mechanism type}$, and $\textit{reason for data collection}$ impact user preferences for text privatization. We learn that while all these factors play a role in influencing privacy decisions, users are highly sensitive to the utility and coherence of the private output texts. Our findings highlight the socio-technical factors that must be considered in the study of DP NLP, opening the door to further user-based investigations going forward.
Specification languages for computational laws versus basic legal principles
Guintchev, Petia, Joosten, Joost J., Fernández, Sofia Santiago, Adamson, Eric Sancho, Sánchez, Aleix Solé, Heredia, Marta Soria
We speak of a \textit{computational law} when that law is intended to be enforced by software through an automated decision-making process. As digital technologies evolve to offer more solutions for public administrations, we see an ever-increasing number of computational laws. Traditionally, law is written in natural language. Computational laws, however, suffer various complications when written in natural language, such as underspecification and ambiguity which lead to a diversity of possible interpretations to be made by the coder. These could potentially result into an uneven application of the law. Thus, resorting to formal languages to write computational laws is tempting. However, writing laws in a formal language leads to further complications, for example, incomprehensibility for non-experts, lack of explicit motivation of the decisions made, or difficulties in retrieving the data leading to the outcome. In this paper, we investigate how certain legal principles fare in both scenarios: computational law written in natural language or written in formal language. We use a running example from the European Union's road transport regulation to showcase the tensions arising, and the benefits from each language.
GRU: Mitigating the Trade-off between Unlearning and Retention for Large Language Models
Wang, Yue, Wang, Qizhou, Liu, Feng, Huang, Wei, Du, Yali, Du, Xiaojiang, Han, Bo
Large language model (LLM) unlearning has demonstrated its essential role in removing privacy and copyright-related responses, crucial for their legal and safe applications. However, the pursuit of complete unlearning often comes with substantial costs due to its compromises in their general functionality, leading to a notorious trade-off between unlearning and retention. In examining the update process for unlearning dynamically, we find gradients hold essential information for revealing this trade-off. In particular, we look at the varying relationship between retention performance and directional disparities between gradients during unlearning. It motivates the sculpting of an update mechanism derived from gradients from two sources, i.e., harmful for retention and useful for unlearning. Accordingly, we propose Gradient Rectified Unlearning (GRU), an enhanced unlearning framework controlling the updating gradients in a geometry-focused and optimization-driven manner such that their side impacts on other, unrelated responses can be minimized. Specifically, GRU derives a closed-form solution to project the unlearning gradient onto the orthogonal space of that gradient harmful for retention, ensuring minimal deviation from its original direction under the condition that overall performance is retained. Comprehensive experiments are conducted to demonstrate that GRU, as a general framework, is straightforward to implement and efficiently enhances a range of baseline methods through its adaptable and compatible characteristics. Additionally, experimental results show its broad effectiveness across a diverse set of benchmarks for LLM unlearning.
Status and Future Prospects of the Standardization Framework Industry 4.0: A European Perspective
Meyer, Olga, Boell, Marvin, Legat, Christoph
The rapid development of Industry 4.0 technologies requires robust and comprehensive standardization to ensure interoperability, safety and efficiency in the Industry of the Future. This paper examines the fundamental role and functionality of standardization, with a particular focus on its importance in Europe's regulatory framework. Based on this, selected topics in context of standardization activities in context intelligent manufacturing and digital twins are highlighted and, by that, an overview of the Industry 4.0 standards framework is provided. This paper serves both as an informative guide to the existing standards in Industry 4.0 with respect to Artificial Intelligence and Digital Twins, and as a call to action for increased cooperation between standardization bodies and the research community. By fostering such collaboration, we aim to facilitate the continued development and implementation of standards that will drive innovation and progress in the manufacturing sector.