AITopics | copying

Collaborating Authors

copying

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

New York Times sues AI startup for 'illegal' copying of millions of articles

The GuardianDec-5-2025, 18:19:12 GMT

New York Times newspaper office building is seen in Manhattan on 26 October 2022. New York Times newspaper office building is seen in Manhattan on 26 October 2022. The New York Times sued an embattled artificial intelligence startup on Friday, accusing the firm of illegally copying millions of articles. The newspaper alleged Perplexity AI had distributed and displayed journalists' work without permission en masse. The Times said that Perplexity AI was also violating its trademarks under the Lanham Act, claiming the startup's generative AI products create fabricated content, or "hallucinations", and falsely attribute them to the newspaper by displaying them alongside its registered trademarks.

artificial intelligence, perplexity, social media, (7 more...)

The Guardian

Country:

North America > United States > New York (0.07)
Europe > Ukraine (0.07)
Oceania > Australia (0.05)
(2 more...)

Industry:

Media > News (1.00)
Law > Intellectual Property & Technology Law (1.00)
Government > Regional Government > North America Government > United States Government (0.53)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Blameless Users in a Clean Room: Defining Copyright Protection for Generative Models

Cohen, Aloni

arXiv.org Artificial IntelligenceDec-4-2025

Are there any conditions under which a generative model's outputs are guaranteed not to infringe the copyrights of its training data? This is the question of "provable copyright protection" first posed by Vyas, Kakade, and Barak (ICML 2023). They define near access-freeness (NAF) and propose it as sufficient for protection. This paper revisits the question and establishes new foundations for provable copyright protection -- foundations that are firmer both technically and legally. First, we show that NAF alone does not prevent infringement. In fact, NAF models can enable verbatim copying, a blatant failure of copy protection that we dub being tainted. Then, we introduce our blameless copy protection framework for defining meaningful guarantees, and instantiate it with clean-room copy protection. Clean-room copy protection allows a user to control their risk of copying by behaving in a way that is unlikely to copy in a counterfactual clean-room setting. Finally, we formalize a common intuition about differential privacy and copyright by proving that DP implies clean-room copy protection when the dataset is golden, a copyright deduplication requirement.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2506.19881

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)

Add feedback

Understanding Cross Task Generalization in Handwriting-Based Alzheimer's Screening via Vision Language Adaptation

Gong, Changqing, Qin, Huafeng, El-Yacoubi, Mounim A.

arXiv.org Artificial IntelligenceNov-11-2025

Alzheimer's disease is a prevalent neurodegenerative disorder for which early detection is critical. Handwriting-often disrupted in prodromal AD-provides a non-invasive and cost-effective window into subtle motor and cognitive decline. Existing handwriting-based AD studies, mostly relying on online trajectories and hand-crafted features, have not systematically examined how task type influences diagnostic performance and cross-task generalization. Meanwhile, large-scale vision language models have demonstrated remarkable zero or few-shot anomaly detection in natural images and strong adaptability across medical modalities such as chest X-ray and brain MRI. However, handwriting-based disease detection remains largely unexplored within this paradigm. To close this gap, we introduce a lightweight Cross-Layer Fusion Adapter framework that repurposes CLIP for handwriting-based AD screening. CLFA implants multi-level fusion adapters within the visual encoder to progressively align representations toward handwriting-specific medical cues, enabling prompt-free and efficient zero-shot inference. Using this framework, we systematically investigate cross-task generalization-training on a specific handwriting task and evaluating on unseen ones-to reveal which task types and writing patterns most effectively discriminate AD. Extensive analyses further highlight characteristic stroke patterns and task-level factors that contribute to early AD identification, offering both diagnostic insights and a benchmark for handwriting-based cognitive assessment.

data mining, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.05841

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Data Science > Data Mining (0.90)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
(2 more...)

Add feedback

Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities

Jobanputra, Mayank, Veitsman, Yana, Sarrof, Yash, Bakalova, Aleksandra, Demberg, Vera, Pavlick, Ellie, Hahn, Michael

arXiv.org Artificial IntelligenceOct-24-2025

Transformers have theoretical limitations in modeling certain sequence-to-sequence tasks, yet it remains largely unclear if these limitations play a role in large-scale pretrained LLMs, or whether LLMs might effectively overcome these constraints in practice due to the scale of both the models themselves and their pretraining data. We explore how these architectural constraints manifest after pretraining, by studying a family of $\textit{retrieval}$ and $\textit{copying}$ tasks inspired by Liu et al. [2024a]. We use a recently proposed framework for studying length generalization [Huang et al., 2025] to provide guarantees for each of our settings. Empirically, we observe an $\textit{induction-versus-anti-induction}$ asymmetry, where pretrained models are better at retrieving tokens to the right (induction) rather than the left (anti-induction) of a query token. This asymmetry disappears upon targeted fine-tuning if length-generalization is guaranteed by theory. Mechanistic analysis reveals that this asymmetry is connected to the differences in the strength of induction versus anti-induction circuits within pretrained transformers. We validate our findings through practical experiments on real-world tasks demonstrating reliability risks. Our results highlight that pretraining selectively enhances certain transformer capabilities, but does not overcome fundamental length-generalization limits.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.21785

Country:

North America (0.45)
Europe > Austria (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Unfair Learning: GenAI Exceptionalism and Copyright Law

Atkinson, David

arXiv.org Artificial IntelligenceApr-1-2025

It examines fair use legal arguments and eight distinct substantive arguments, contending that every legal and substantive argument favoring fair use for GenAI applies equally, if not more so, to humans. Therefore, granting GenAI exceptional privileges in this domain is legally and logically inco nsistent with withholding broad fair use exemptions from individual humans.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2504.00955

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry:

Law > Intellectual Property & Technology Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Media > Film (0.93)
Law > Litigation (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Information Management (0.93)
(3 more...)

Add feedback

Mimetic Initialization Helps State Space Models Learn to Recall

Trockman, Asher, Harutyunyan, Hrayr, Kolter, J. Zico, Kumar, Sanjiv, Bhojanapalli, Srinadh

arXiv.org Artificial IntelligenceOct-14-2024

Recent work has shown that state space models such as Mamba are significantly worse than Transformers on recall-based tasks due to the fact that their state size is constant with respect to their input sequence length. But in practice, state space models have fairly large state sizes, and we conjecture that they should be able to perform much better at these tasks than previously reported. We investigate whether their poor copying and recall performance could be due in part to training difficulties rather than fundamental capacity constraints. Based on observations of their "attention" maps, we propose a structured initialization technique that allows state space layers to more readily mimic attention. Across a variety of architecture settings, our initialization makes it substantially easier for Mamba to learn to copy and do associative recall from scratch.

artificial intelligence, machine learning, mamba, (14 more...)

arXiv.org Artificial Intelligence

2410.11135

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Language Models "Grok" to Copy

Lv, Ang, Xie, Ruobing, Sun, Xingwu, Kang, Zhanhui, Yan, Rui

arXiv.org Artificial IntelligenceSep-13-2024

We examine the pre-training dynamics of language models, focusing on their ability to copy text from preceding context--a fundamental skill for various LLM applications, including in-context learning (ICL) and retrieval-augmented generation (RAG). We propose a novel perspective that Transformer-based language models develop copying abilities similarly to grokking, which refers to sudden generalization on test set long after the model fit to the training set. Our experiments yield three arguments: (1) The pre-training loss decreases rapidly, while the context copying ability of models initially lags and then abruptly saturates. (2) The speed of developing copying ability is independent of the number of tokens trained, similarly to how grokking speed is unaffected by dataset size as long as the data distribution is preserved. (3) Induction heads, the attention heads responsible for copying, form from shallow to deep layers during training, mirroring the development of circuits in deeper layers during grokking. We contend that the connection between grokking and context copying can provide valuable insights for more effective language model training, ultimately improving in-context performance. For example, we demonstrated that techniques that enhance grokking, such as regularization, either accelerate or enhance the development of context copying.

copying, induction head, language model, (14 more...)

arXiv.org Artificial Intelligence

2409.09281

Country:

Asia > Thailand > Bangkok > Bangkok (0.05)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Towards a Cyber Information Ontology

Limbaugh, David, Jensen, Mark, Beverley, John

arXiv.org Artificial IntelligenceAug-15-2024

This paper introduces a set of terms that are intended to act as an interface between cyber ontologies (like a file system ontology or a data fusion ontology) and top- and mid-level ontologies, specifically Basic Formal Ontology and the Common Core Ontologies. These terms center on what makes cyberinformation management unique: numerous acts of copying items of information, the aggregates of copies that result from those acts, and the faithful members of those aggregates that represent all other members.

copying, information, reference carrier, (13 more...)

arXiv.org Artificial Intelligence

2407.18998

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Virginia > Fairfax County > Fairfax (0.04)
North America > United States > New York > Erie County > Buffalo (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation

Chen, Tong, Asai, Akari, Mireshghallah, Niloofar, Min, Sewon, Grimmelmann, James, Choi, Yejin, Hajishirzi, Hannaneh, Zettlemoyer, Luke, Koh, Pang Wei

arXiv.org Artificial IntelligenceJul-9-2024

Evaluating the degree of reproduction of copyright-protected content by language models (LMs) is of significant interest to the AI and legal communities. Although both literal and non-literal similarities are considered by courts when assessing the degree of reproduction, prior research has focused only on literal similarities. To bridge this gap, we introduce CopyBench, a benchmark designed to measure both literal and non-literal copying in LM generations. Using copyrighted fiction books as text sources, we provide automatic evaluation protocols to assess literal and non-literal copying, balanced against the model utility in terms of the ability to recall facts from the copyrighted works and generate fluent completions. We find that, although literal copying is relatively rare, two types of non-literal copying -- event copying and character copying -- occur even in models as small as 7B parameters. Larger models demonstrate significantly more copying, with literal copying rates increasing from 0.2% to 10.5% and non-literal copying from 2.3% to 6.9% when comparing Llama3-8B and 70B models, respectively. We further evaluate the effectiveness of current strategies for mitigating copying and show that (1) training-time alignment can reduce literal copying but may increase non-literal copying, and (2) current inference-time mitigation methods primarily reduce literal but not non-literal copying.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2407.07087

Country:

Asia > Singapore (0.04)
North America > United States > Alabama (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(4 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Law > Intellectual Property & Technology Law (1.00)
Leisure & Entertainment (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

The Flaw That Could Ruin Generative AI

The Atlantic - TechnologyJan-11-2024, 18:49:53 GMT

And because a LLM doesn't "know" when it's quoting from training data, there's no obvious way to prevent the behavior. I spoke with Florian Tramèr, a prominent AI-security researcher and co-author of some of the above studies. It's "an extremely tricky problem to study," he told me. "It's very, very hard to pin down a good definition of memorization." One way to understand the concept is to think of an LLM as an enormous decision tree in which each node is an English word. From a given starting word, an LLM chooses the next word from the entire English vocabulary.

generative ai, memorization, training data, (16 more...)

The Atlantic - Technology

Country: North America > United States > California (0.05)

Industry:

Law > Litigation (1.00)
Law > Intellectual Property & Technology Law (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.60)

Add feedback