The unprecedented interest, investment, and deployment of machine learning across many aspects of our lives in the past decade has come with a cost. Although there has been some movement towards moderating machine learning where it has been genuinely harmful, it's becoming increasingly clear that existing approaches suffer significant shortcomings. Nevertheless, there still exist new directions that hold potential for meaningfully addressing the harms of machine learning. In particular, new approaches to licensing the code and models that underlie these systems have the potential to create a meaningful impact on how they affect our world. This is Part I of a two-part essay.
Artificial intelligence (AI) is set to transform many aspects of our lives, including our home and health. AI is already widely used in internet searches, and home devices with speech recognition, but in the near future we will see AI become even more widespread. This will have significant repercussions as AI performs many tasks that until now could only be undertaken by humans. AI will remove human intervention from much of the picture. This will particularly affect intellectual property law.
When Michelle K. Lee, '88, SM '89, was sworn in as the director of the U.S. Patent and Trademark Agency in 2015, she saw an opportunity. The agency was a bit behind on digital transformation and adopting things like cloud computing and artificial intelligence, but the organization had mountains of data -- like more than 10 million patents the office has issued since opening in 1802, and 600,000 patent applications received each year. Lee led a project to use data and analytics to modernize the agency, such as implementing AI solutions to improve patent searches and the speed and quality of patents issued. By gathering data about how patent examiners make decisions, and determining outlying behavior, the office could also pinpoint areas in which examiners would benefit from targeted training. "If the U.S. Patent and Trademark Office, a 200-plus-year-old governmental agency, has a machine learning opportunity, so too does every organization," Lee said during a presentation at EmTech Digital, hosted by MIT Technology Review.
What does a chair from furniture manufacturer Kartell have in common with a rocket engine by the software powerhouse Hyperganic? They were both created by generative design -- in other words, made by AI. But it's a far cry from simple CAD design, using algorithms created by AI to generate a first set of designs for a product based on certain input parameters. It will then continue to refine these designs with each iteration until the final product materialises. Combined with industrial 3D printing, the result is a technically superior product that weighs less, has better functional features and is often less prone to wear and tear.
Two key components for using ML responsibly provide a prudent "start here" for organizations: model explainability and data transparency. The inability to explain why a model arrived at a particular result presents a level of risk in nearly every industry. In some areas, like healthcare, the stakes are particularly high when a model could be presenting a recommendation for patient care. In financial services, regulators may need to know why a lender is making a loan. Data transparency can ensure there is no unfair or unintended bias in the training data sets used to build the model, which can lead to disparate impact for protected classes – and consumers have what is increasingly a legally protected right to know how their data is being used.
But how can organizations developing ML models enforce explainability and transparency standards when doing so might mean sharing with the public the very features, data sets, and model frameworks that represent that organization's proprietary intellectual property (IP)? Given machine learning's complexity and interdisciplinary nature, executives should employ a wide variety of approaches to manage the associated risks, which include building risk management into model development and applying holistic risk frameworks that leverage and adapt principles used in managing other types of enterprise risk. Whereas standard technical documentation is created to help practitioners implement a model, documentation focused on explainability and transparency informs consumers, regulators, and others about why and how a model or data set is being used. Such documentation includes a high-level overview of the model itself, including: its intended purpose, performance, and provenance; information about the training data set and training process; known issues or tradeoffs with the model; identified risk mitigation strategies; and any other information that can help contextualize the technology. Similarly, model documentation can become the proxy for sharing the model and its features and data sets with the world as opposed to sharing the actual "cookie recipe."
Due to the wide use of highly-valuable and large-scale deep neural networks (DNNs), it becomes crucial to protect the intellectual property of DNNs so that the ownership of disputed or stolen DNNs can be verified. Most existing solutions embed backdoors in DNN model training such that DNN ownership can be verified by triggering distinguishable model behaviors with a set of secret inputs. However, such solutions are vulnerable to model fine-tuning and pruning. They also suffer from fraudulent ownership claim as attackers can discover adversarial samples and use them as secret inputs to trigger distinguishable behaviors from stolen models. To address these problems, we propose a novel DNN watermarking solution, named HufuNet, for protecting the ownership of DNN models. We evaluate HufuNet rigorously on four benchmark datasets with five popular DNN models, including convolutional neural network (CNN) and recurrent neural network (RNN). The experiments demonstrate HufuNet is highly robust against model fine-tuning/pruning, kernels cutoff/supplement, functionality-equivalent attack, and fraudulent ownership claims, thus highly promising to protect large-scale DNN models in the real-world.
The Korean Intellectual Property Office (KIPO) announced Patent Examination Guidelines for key technology areas related to the Fourth Industrial Revolution, including machine learning based artificial intelligence ("AI"), on January 18, 2021. In the Examination Guidelines for AI, KIPO outlines specific guidelines on description and novelty/inventiveness requirements for different categories of AI inventions (e.g., AI model training invention and AI application invention, as depicted below), in addition to eligibility requirements which correspond to that of computer-related inventions. In particular, KIPO's Examination Guidelines provide examples of various AI inventions with practical drafting tips on enablement (Article 42(3)(i) of Patent Act) and inventiveness requirements (Article 29(2)). Under Article 42(3)(i), the description of an invention shall be written clearly and fully so that a person with ordinary skill in the art (POSITA) to which the invention pertains can easily practice the claimed invention. For an AI invention, KIPO suggests that the description of the technical problem, solution, and specific technical configuration (e.g., training data, data preprocessing, trained model, and loss function, etc.) be included to enable a POSITA to practice the claimed invention, unless the technical configuration is well known in the art.
This article was originally published by Industry Today on March 3, 2021, and is reproduced below in full with permission. With rapid changes, pressure to innovate, and acceleration of implementation of advanced technology across all stages of the supply chain over the past year, there are important intellectual property (IP) considerations that companies need to make to protect their inventions. Leading edge tech like Augmented and Virtual Reality, machine learning and Artificial Intelligence, and 3D printing have become integral to business success yet continue to cause confusion around how the technology should be patented. This article explores some of the nuances as they relate to the art of protecting the software that fuels the base technology of these advanced innovations and important considerations that need to be made in the current environment. Most machine learning (ML) and artificial intelligence (AI) innovations are generally based in computer software. While courts and the U.S. Patent and Trademark Office ("U.S. PTO") have established limits on the ability to patent computer software, it is still possible to obtain meaningful, broad, and valuable patent protection on computer software.
In this paper we present a method to concatenate patent claims to their own description. By applying this method, BERT trains suitable descriptions for claims. Such a trained BERT (claim-to-description- BERT) could be able to identify novelty relevant descriptions for patents. In addition, we introduce a new scoring scheme, relevance scoring or novelty scoring, to process the output of BERT in a meaningful way. We tested the method on patent applications by training BERT on the first claims of patents and corresponding descriptions. BERT's output has been processed according to the relevance score and the results compared with the cited X documents in the search reports. The test showed that BERT has scored some of the cited X documents as highly relevant.