AITopics | open-source code

Collaborating Authors

open-source code

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

832ea0ff01bd512aab28bf416db9489c-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-15-2026, 14:26:26 GMT

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
Asia > Macao (0.04)

Industry:

Law (0.68)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.73)
Information Technology > Artificial Intelligence > Vision (0.50)

Add feedback

832ea0ff01bd512aab28bf416db9489c-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsOct-9-2025, 00:07:47 GMT

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
Asia > Macao (0.04)

Industry:

Law (0.68)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.72)
Information Technology > Artificial Intelligence > Vision (0.50)

Add feedback

A First Look at License Compliance Capability of LLMs in Code Generation

Xu, Weiwei, Gao, Kai, He, Hao, Zhou, Minghui

arXiv.org Artificial IntelligenceAug-5-2024

Recent advances in Large Language Models (LLMs) have revolutionized code generation, leading to widespread adoption of AI coding tools by developers. However, LLMs can generate license-protected code without providing the necessary license information, leading to potential intellectual property violations during software production. This paper addresses the critical, yet underexplored, issue of license compliance in LLM-generated code by establishing a benchmark to evaluate the ability of LLMs to provide accurate license information for their generated code. To establish this benchmark, we conduct an empirical study to identify a reasonable standard for "striking similarity" that excludes the possibility of independent creation, indicating a copy relationship between the LLM output and certain open-source code. Based on this standard, we propose an evaluation benchmark LiCoEval, to evaluate the license compliance capabilities of LLMs. Using LiCoEval, we evaluate 14 popular LLMs, finding that even top-performing LLMs produce a non-negligible proportion (0.88% to 2.01%) of code strikingly similar to existing open-source implementations. Notably, most LLMs fail to provide accurate license information, particularly for code under copyleft licenses. These findings underscore the urgent need to enhance LLM compliance capabilities in code generation tasks. Our study provides a foundation for future research and development to improve license compliance in AI-assisted software development, contributing to both the protection of open-source software copyrights and the mitigation of legal risks for LLM users.

license information, llm, similarity, (11 more...)

arXiv.org Artificial Intelligence

2408.02487

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Russia (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Law > Intellectual Property & Technology Law (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Are ChatGPT and Other Similar Systems the Modern Lernaean Hydras of AI?

Ioannidis, Dimitrios, Kepner, Jeremy, Bowne, Andrew, Bryant, Harriet S.

arXiv.org Artificial IntelligenceJan-30-2024

The rise of Generative Artificial Intelligence systems ("AI systems") has created unprecedented social engagement. AI code generation systems provide responses (output) to questions or requests by accessing the vast library of open-source code created by developers over the past few decades. However, they do so by allegedly stealing the open-source code stored in virtual libraries, known as repositories. This Article focuses on how this happens and whether there is a solution that protects innovation and avoids years of litigation. We also touch upon the array of issues raised by the relationship between AI and copyright. Looking ahead, we propose the following: (a) immediate changes to the licenses for open-source code created by developers that will limit access and/or use of any open-source code to humans only; (b) we suggest revisions to the Massachusetts Institute of Technology ("MIT") license so that AI systems are required to procure appropriate licenses from open-source code developers, which we believe will harmonize standards and build social consensus for the benefit of all of humanity, rather than promote profit-driven centers of innovation; (c) we call for urgent legislative action to protect the future of AI systems while also promoting innovation; and (d) we propose a shift in the burden of proof to AI systems in obfuscation cases.

ai system, github, open-source code, (14 more...)

arXiv.org Artificial Intelligence

2306.09267

Country:

North America > United States > Wisconsin (0.04)
North America > United States > New York (0.04)
North America > United States > Indiana (0.04)
(5 more...)

Genre: Research Report (0.63)

Industry:

Law > Litigation (1.00)
Law > Intellectual Property & Technology Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.52)

Add feedback

Viral transmission in pedestrian crowds: Coupling an open-source code assessing the risks of airborne contagion with diverse pedestrian dynamics models

Nicolas, Alexandre, Mendez, Simon

arXiv.org Artificial IntelligenceDec-4-2023

We study viral transmission in crowds via the short-ranged airborne pathway using a purely model-based approach. Our goal is two-pronged. Firstly, we illustrate with a concrete and pedagogical case study how to estimate the risks of new viral infections by coupling pedestrian simulations with the transmission algorithm that we recently released as open-source code. The algorithm hinges on pre-computed viral concentration maps derived from computational fluid dynamics (CFD) simulations. Secondly, we investigate to what extent the transmission risk predictions depend on the pedestrian dynamics model in use. For the simple bidirectional flow under consideration, the predictions are found to be surprisingly stable across initial conditions and models, despite the different microscopic arrangements of the simulated crowd, as long as the crowd evolves in a qualitatively similarly way. On the other hand, when major changes are observed in the crowd's behaviour, notably whenever a jam occurs at the centre of the channel, the estimated risks surge drastically.

simulation, transmission, viral transmission, (15 more...)

arXiv.org Artificial Intelligence

2312.01779

Country: Europe > France > Occitanie > Hérault > Montpellier (0.05)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.87)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

Whose Text Is It Anyway? Exploring BigCode, Intellectual Property, and Ethics

Choksi, Madiha Zahrah, Goedicke, David

arXiv.org Artificial IntelligenceApr-5-2023

Intelligent or generative writing tools rely on large language models that recognize, summarize, translate, and predict content. This position paper probes the copyright interests of open data sets used to train large language models (LLMs). Our paper asks, how do LLMs trained on open data sets circumvent the copyright interests of the used data? We start by defining software copyright and tracing its history. We rely on GitHub Copilot as a modern case study challenging software copyright. Our conclusion outlines obstacles that generative writing assistants create for copyright, and offers a practical road map for copyright analysis for developers, software law experts, and general users to consider in the context of intelligent LLM-powered writing tools.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.02839

Country: North America > United States > New York > New York County > New York City (0.06)

Genre: Research Report (0.70)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

The (ab)use of Open Source Code to Train Large Language Models

Al-Kaswan, Ali, Izadi, Maliheh

arXiv.org Artificial IntelligenceFeb-28-2023

In recent years, Large Language Models (LLMs) have gained significant popularity due to their ability to generate human-like text and their potential applications in various fields, such as Software Engineering. LLMs for Code are commonly trained on large unsanitized corpora of source code scraped from the Internet. The content of these datasets is memorized and emitted by the models, often in a verbatim manner. In this work, we will discuss the security, privacy, and licensing implications of memorization. We argue why the use of copyleft code to train LLMs is a legal and ethical dilemma. Finally, we provide four actionable recommendations to address this issue.

information, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.13681

Country: Europe > Netherlands > South Holland > Delft (0.06)

Genre: Research Report (0.83)

Industry: Law (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Machine Learning Is Not Your Copilot: AI System Accused of Violating Open Source Copyright Licenses

#artificialintelligenceJan-10-2023, 23:45:21 GMT

As previously reported in this space, the Court of Appeal for the Federal Circuit has ruled that an AI machine cannot be an inventor because it is not a "natural person." You can read those posts here and here. On November 11, 2022, a group of plaintiffs filed suit in the Northern District of California against several defendants, including GitHub, Inc., Microsoft Corporation, and OpenAI, Inc. and related companies to OpenAI. The issue stems from a product called Copilot and a product integrated into Copilot called Codex. To provide some context of the issue, some backstory may help.

large language model, machine learning, natural language, (21 more...)

#artificialintelligence

Country: North America > United States > California (0.28)

Industry:

Law > Litigation (1.00)
Government > Regional Government > North America Government > United States Government (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.50)

Add feedback

How open-source software shapes AI policy

#artificialintelligenceAug-10-2021, 15:21:22 GMT

Open-source software quietly affects nearly every issue in AI policy, but it is largely absent from discussions around AI policy--policymakers need to more actively consider OSS's role in AI. Open-source software (OSS), software that is free to access, use, and change without restrictions, plays a central role in the development and use of artificial intelligence (AI). Across open-source programming languages such as Python, R, C, Java, Scala, Javascript, Julia, and others, there are thousands of implementations of machine learning algorithms. OSS frameworks for machine learning, including tidymodels in R and Scikit-learn in Python, have helped consolidate many diverse algorithms into a consistent machine learning process and enabled far easier use for the everyday data scientist. There are also OSS tools specific to the especially important subfield of deep learning, which is dominated by Google's Tensorflow and Facebook's PyTorch.

algorithm, data scientist, google and facebook, (12 more...)

#artificialintelligence

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Industry:

Government > Regional Government > North America Government > United States Government (0.69)
Information Technology > Services (0.67)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.72)

Add feedback

How secure are your AI and machine learning projects?

#artificialintelligenceNov-28-2020, 04:50:07 GMT

When enterprises adopt new technology, security is often on the back burner. It can seem more important to get new products or services to customers and internal users as quickly as possible and at the lowest cost. Good security can be slow and expensive. Artificial intelligence (AI) and machine learning (ML) offer all the same opportunities for vulnerabilities and misconfigurations as earlier technological advances, but they also have unique risks. As enterprises embark on major AI-powered digital transformations, those risks may become greater.

ai and ml project, algorithm, information, (13 more...)

#artificialintelligence

AI-Alerts: 2020 > 2020-12 > AAAI AI-Alert for Dec 1, 2020 (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.88)

Add feedback