AITopics | memfree

Collaborating Authors

memfree

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Y angsibo Huang 1 Noah A. Smith

Neural Information Processing SystemsFeb-18-2026, 19:09:03 GMT

When turned on, "GitHub Copilot checks code completion suggestions with their surrounding

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

South America > Peru (0.14)
North America > Belize (0.14)
North America > Mexico (0.14)
(9 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law > Intellectual Property & Technology Law (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.93)
Information Technology (0.93)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Y angsibo Huang 1 Noah A. Smith

Neural Information Processing SystemsOct-10-2025, 22:13:29 GMT

When turned on, "GitHub Copilot checks code completion suggestions with their surrounding

blocklisted content, evaluation, memfree, (13 more...)

Neural Information Processing Systems

Country:

South America > Peru (0.14)
North America > Belize (0.14)
North America > Mexico (0.14)
(9 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law > Intellectual Property & Technology Law (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.93)
Information Technology (0.93)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Certified Mitigation of Worst-Case LLM Copyright Infringement

Zhang, Jingyu, Yu, Jiacan, Marone, Marc, Van Durme, Benjamin, Khashabi, Daniel

arXiv.org Artificial IntelligenceApr-24-2025

The exposure of large language models (LLMs) to copyrighted material during pre-training raises concerns about unintentional copyright infringement post deployment. This has driven the development of "copyright takedown" methods, post-training approaches aimed at preventing models from generating content substantially similar to copyrighted ones. While current mitigation approaches are somewhat effective for average-case risks, we demonstrate that they overlook worst-case copyright risks exhibits by the existence of long, verbatim quotes from copyrighted sources. We propose BloomScrub, a remarkably simple yet highly effective inference-time approach that provides certified copyright takedown. Our method repeatedly interleaves quote detection with rewriting techniques to transform potentially infringing segments. By leveraging efficient data sketches (Bloom filters), our approach enables scalable copyright screening even for large-scale real-world corpora. When quotes beyond a length threshold cannot be removed, the system can abstain from responding, offering certified risk reduction. Experimental results show that BloomScrub reduces infringement risk, preserves utility, and accommodates different levels of enforcement stringency with adaptive abstention. Our results suggest that lightweight, inference-time methods can be surprisingly effective for copyright prevention.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.16046

Country:

North America > United States (1.00)
Asia (0.93)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Intellectual Property & Technology Law (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Evaluating Copyright Takedown Methods for Language Models

Wei, Boyi, Shi, Weijia, Huang, Yangsibo, Smith, Noah A., Zhang, Chiyuan, Zettlemoyer, Luke, Li, Kai, Henderson, Peter

arXiv.org Artificial IntelligenceJul-11-2024

Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material. These models can memorize and generate content similar to their training data, posing potential concerns. Therefore, model creators are motivated to develop mitigation methods that prevent generating protected content. We term this procedure as copyright takedowns for LMs, noting the conceptual similarity to (but legal distinction from) the DMCA takedown This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs. We propose CoTaEval, an evaluation framework to assess the effectiveness of copyright takedown methods, the impact on the model's ability to retain uncopyrightable factual knowledge from the training data whose recitation is embargoed, and how well the model maintains its general utility and efficiency. We examine several strategies, including adding system prompts, decoding-time filtering interventions, and unlearning approaches. Our findings indicate that no tested method excels across all metrics, showing significant room for research in this unique problem setting and indicating potential unresolved challenges for live policy proposals.

blocklisted content, memfree, similarity, (13 more...)

arXiv.org Artificial Intelligence

2406.18664

Country:

South America > Peru (0.14)
North America > Belize (0.14)
North America > Mexico (0.14)
(7 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.93)
Government > Regional Government > North America Government > United States Government (0.93)
Leisure & Entertainment > Sports > Baseball (0.93)
Law > Intellectual Property & Technology Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback