Persistent and Unforgeable Watermarks for Deep Neural Networks
Li, Huiying, Willson, Emily, Zheng, Haitao, Zhao, Ben Y.
Abstract--As deep learning classifiers continue to mature, model providers with sufficient data and computation resour ces are exploring approaches to monetize the development of inc reas-ingly powerful models. Licensing models is a promising appr oach, but requires a robust tool for owners to claim ownership of models, i.e. a watermark. Unfortunately, current watermarks are all vulnerable to piracy attacks, where attackers embed for ged watermarks into a model to dispute ownership. We believe properties of persistence and piracy resistance are critical to watermarks, but are fundamentally at odds with t he current way models are trained and tuned. In this work, we propose two new training techniques (out-of-bound values a nd null-embedding) that provide persistence and limit the tra ining of certain inputs into trained models. We then introduce wonder filters, a new primitive that embeds a persistent bit-sequence into a model, but only at initial training time. Wonder filter s enable model owners to embed a bit-sequence generated from their private keys into a model at training time. Attackers c annot remove wonder filters via tuning, and cannot add their own filters to pretrained models. We provide analytical proofs o f key properties, and experimentally validate them over a var iety of tasks and models. Finally, we explore a number of adaptive countermeasures, and show our watermark remains robust. Building deep neural networks (DNNs) is an expensive process. It requires significant resources, both in terms of extremely large training datasets and powerful computing resources. For example, Googles InceptionV3 model, first proposed in 2015, is based on a sophisticated architecture w ith 48 layers, trained on 1.28M labeled images over 2 weeks on 8 GPUs. As a result, model training is increasingly limited to a small group of companies with sufficient access to both data and computation. As the costs of these models continue to rise, model providers are exploring multiple approaches to monetize mo d-els to recoup their training costs. These include Machine Learning as a Service (MLaaS) platforms ( e.g. Both approaches have serious limitations. Hosted models are vulnerable to a number of model inversion or inference attacks ( e.g. Ideally, DNN watermarks are capable of providing the proof of model ownership necessary for model licensing. Upon demand, a robust watermark would provide a persistent and verifiable link between the model (or any derivatives) and it s owner. Such a watermark would require three properties.
Oct-2-2019
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: