Goto

Collaborating Authors

 original work


AI's Memorization Crisis

The Atlantic - Technology

Large language models don't "learn"--they copy. And that could change everything for the tech industry. O n Tuesday, researchers at Stanford and Yale revealed something that AI companies would prefer to keep hidden. Four popular large language models--OpenAI's GPT, Anthropic's Claude, Google's Gemini, and xAI's Grok--have stored large portions of some of the books they've been trained on, and can reproduce long excerpts from those books. In fact, when prompted strategically by researchers, Claude delivered the near-complete text of,,, and, in addition to thousands of words from books including and .


Opacity as Authority: Arbitrariness and the Preclusion of Contestation

Kayembe, Naomi Omeonga wa

arXiv.org Artificial Intelligence

This article redefines arbitrariness not as a normative flaw or a symptom of domination, but as a foundational functional mechanism structuring human systems and interactions. Diverging from critical traditions that conflate arbitrariness with injustice, it posits arbitrariness as a semiotic trait: a property enabling systems - linguistic, legal, or social - to operate effectively while withholding their internal rationale. Building on Ferdinand de Saussure's concept of l'arbitraire du signe, the analysis extends this principle beyond language to demonstrate its cross-domain applicability, particularly in law and social dynamics. The paper introduces the "Motivation -> Constatability -> Contestability" chain, arguing that motivation functions as a crucial interface rendering an act's logic vulnerable to intersubjective contestation. When this chain is broken through mechanisms like "immotivization" or "Conflict Lateralization" (exemplified by "the blur of the wolf drowned in the fish"), acts produce binding effects without exposing their rationale, thus precluding justiciability. This structural opacity, while appearing illogical, is a deliberate design protecting authority from accountability. Drawing on Shannon's entropy model, the paper formalizes arbitrariness as A = H(L|M) (conditional entropy). It thereby proposes a modern theory of arbitrariness as a neutral operator central to control as well as care, an overlooked dimension of interpersonal relations. While primarily developed through human social systems, this framework also illuminates a new pathway for analyzing explainability in advanced artificial intelligence systems.


Unfair Learning: GenAI Exceptionalism and Copyright Law

Atkinson, David

arXiv.org Artificial Intelligence

It examines fair use legal arguments and eight distinct substantive arguments, contending that every legal and substantive argument favoring fair use for GenAI applies equally, if not more so, to humans. Therefore, granting GenAI exceptional privileges in this domain is legally and logically inco nsistent with withholding broad fair use exemptions from individual humans.


TechScape: Elon Musk is stumping hard for Donald Trump

The Guardian

Thank you for joining me. Elon Musk is stumping hard for Donald Trump. The Tesla and SpaceX CEO has funded a pro-Trump political action committee with tens of millions of dollars and planned a packed campaign schedule to boost the former president in Pennsylvania. He speaks to Trump multiple times per week and has urged other billionaires to endorse the Republican candidate en masse in private gatherings, according to the New York Times. Taken together, Musk's actions amount to something unprecedented in modern times – a man who is both the richest in the world and owner of an influential means of mass communication throwing all his weight behind a political candidate.


Randomization Techniques to Mitigate the Risk of Copyright Infringement

Chen, Wei-Ning, Kairouz, Peter, Oh, Sewoong, Xu, Zheng

arXiv.org Artificial Intelligence

In this paper, we investigate potential randomization approaches that can complement current practices of input-based methods (such as licensing data and prompt filtering) and output-based methods (such as recitation checker, license checker, and model-based similarity score) for copyright protection. This is motivated by the inherent ambiguity of the rules that determine substantial similarity in copyright precedents. Given that there is no quantifiable measure of substantial similarity that is agreed upon, complementary approaches can potentially further decrease liability. Similar randomized approaches, such as differential privacy, have been successful in mitigating privacy risks. This document focuses on the technical and research perspective on mitigating copyright violation and hence is not confidential. After investigating potential solutions and running numerical experiments, we concluded that using the notion of Near Access-Freeness (NAF) to measure the degree of substantial similarity is challenging, and the standard approach of training a Differentially Private (DP) model costs significantly when used to ensure NAF. Alternative approaches, such as retrieval models, might provide a more controllable scheme for mitigating substantial similarity.


Google's AI Overview Search Results Copied My Original Work

WIRED

Last week, an AI Overview search result from Google used one of my WIRED articles in an unexpected way that makes me fearful for the future of journalism. I was experimenting with AI Overviews, the company's new generative AI feature designed to answer online queries. I asked it multiple questions about topics I've recently covered, so I wasn't shocked to see my article linked, as a footnote, way at the bottom of the box containing the answer to my query. But I was caught off guard by how much the first paragraph of an AI Overview pulled directly from my writing. The following screenshot on the left is from an interview I conducted with one of Anthropic's product developers about tips for using the company's Claude chatbot.


ReproHum #0087-01: Human Evaluation Reproduction Report for Generating Fact Checking Explanations

Loakman, Tyler, Lin, Chenghua

arXiv.org Artificial Intelligence

This paper presents a partial reproduction of Generating Fact Checking Explanations by Atanasova et al. (2020) as part of the ReproHum (Belz and Thomson, 2024) element of the ReproNLP shared task to reproduce the findings of NLP research regarding human evaluation. This shared task aims to investigate the extent to which NLP as a field is becoming more or less reproducible over time. Following the instructions provided by the task organisers and the original authors, we collect relative rankings of 3 fact-checking explanations (comprising a gold standard and the outputs of 2 models) for 40 inputs on the criteria of Coverage. The results of our reproduction and reanalysis of the original work's raw results lend support to the original findings, with similar patterns seen between the original work and our reproduction. Whilst we observe slight variation from the original results, our findings support the main conclusions drawn by the original authors pertaining to the efficacy of their proposed models.


Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques

Lorandi, Michela, Belz, Anya

arXiv.org Artificial Intelligence

Rerunning a metric-based evaluation should be more straightforward, and results should be closer, than in a human-based evaluation, especially where code and model checkpoints are made available by the original authors. As this report of our efforts to rerun a metric-based evaluation of a set of single-attribute and multiple-attribute controllable text generation (CTG) techniques shows however, such reruns of evaluations do not always produce results that are the same as the original results, and can reveal errors in the reporting of the original work.


A Second Look on BASS -- Boosting Abstractive Summarization with Unified Semantic Graphs -- A Replication Study

Koraş, Osman Alperen, Schlötterer, Jörg, Seifert, Christin

arXiv.org Artificial Intelligence

We present a detailed replication study of the BASS framework, an abstractive summarization system based on the notion of Unified Semantic Graphs. Our investigation includes challenges in replicating key components and an ablation study to systematically isolate error sources rooted in replicating novel components. Our findings reveal discrepancies in performance compared to the original work. We highlight the significance of paying careful attention even to reasonably omitted details for replicating advanced frameworks like BASS, and emphasize key practices for writing replicable papers.


Generative AI Is Challenging a 234-Year-Old Law

The Atlantic - Technology

It took Ralph Ellison seven years to write Invisible Man. It took J. D. Salinger about 10 to write The Catcher in the Rye. J. K. Rowling spent at least five years on the first Harry Potter book. Writing with the hope of publishing is always a leap of faith. Will you finish the project?