paw
Task-Agnostic Language Model Watermarking via High Entropy Passthrough Layers
Masrani, Vaden, Akbari, Mohammad, Yue, David Ming Xuan, Rezaei, Ahmad, Zhang, Yong
In the era of costly pre-training of large language models, ensuring the intellectual property rights of model owners, and insuring that said models are responsibly deployed, is becoming increasingly important. To this end, we propose model watermarking via passthrough layers, which are added to existing pre-trained networks and trained using a self-supervised loss such that the model produces high-entropy output when prompted with a unique private key, and acts normally otherwise. Unlike existing model watermarking methods, our method is fully task-agnostic, and can be applied to both classification and sequence-to-sequence tasks without requiring advanced access to downstream fine-tuning datasets. We evaluate the proposed passthrough layers on a wide range of downstream tasks, and show experimentally our watermarking method achieves a near-perfect watermark extraction accuracy and false-positive rate in most cases without damaging original model performance. Additionally, we show our method is robust to both downstream fine-tuning, fine-pruning, and layer removal attacks, and can be trained in a fraction of the time required to train the original model. Code is available in the paper.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Embed and Project: Discrete Sampling with Universal Hashing
We consider the problem of sampling from a probability distribution defined over a high-dimensional discrete set, specified for instance by a graphical model. We propose a sampling algorithm, called PAWS, based on embedding the set into a higher-dimensional space which is then randomly projected using universal hash functions to a lower-dimensional subspace and explored using combinatorial search methods. Our scheme can leverage fast combinatorial optimization tools as a blackbox and, unlike MCMC methods, samples produced are guaranteed to be within an (arbitrarily small) constant factor of the true probability distribution. We demonstrate that by using state-of-the-art combinatorial search tools, PAWS can efficiently sample from Ising grids with strong interactions and from software verification instances, while MCMC and variational methods fail in both cases.
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- Asia > Middle East > Jordan (0.04)
Improved Prototypical Semi-Supervised Learning with Foundation Models: Prototype Selection, Parametric vMF-SNE Pretraining and Multi-view Pseudolabelling
Mannix, Evelyn, Bondell, Howard
In this paper we present an improved approach to prototypical semi-supervised learning for computer vision, in the context of leveraging a frozen foundation model as the backbone of our neural network. As a general tool, we propose parametric von-Mises Fisher Stochastic Neighbour Embedding (vMF-SNE) to create mappings with neural networks between high-dimensional latent spaces that preserve local structure. This enables us to pretrain the projection head of our network using the high-quality embeddings of the foundation model with vMF-SNE. We also propose soft multi-view pseudolabels, where predictions across multiple views are combined to provide a more reliable supervision signal compared to a consistency or swapped assignment approach. We demonstrate that these ideas improve upon P}redicting View-Assignments with Support Samples (PAWS), a current state-of-the-art semi-supervised learning method, as well as Robust PAWS (RoPAWS), over a range of benchmarking datasets. We also introduce simple $k$-means prototype selection, a technique that provides superior performance to other unsupervised label selection approaches in this context. These changes improve upon PAWS by an average of +2.9% for CIFAR-10 and +5.7% for CIFAR-100 with four labels per class, and by +15.2% for DeepWeeds, a particularly challenging dataset for semi-supervised learning. We also achieve new state-of-the-art results in semi-supervised learning in this small label regime for CIFAR-10 - 95.8% (+0.7%) and CIFAR-100 - 76.6% (+12.0%).
- Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt
Ye, Seonghyeon, Jang, Joel, Kim, Doyoung, Jo, Yongrae, Seo, Minjoon
Enhancing the zero-shot performance of instruction-following models requires heavy computation, either by scaling the total number of training datasets or the model size. In this work, we explore how retrieval of soft prompts obtained through prompt tuning can efficiently assist hard prompts in zero-shot task generalization. Specifically, we train soft prompt embeddings for each prompt through prompt tuning, store the samples of the training instances mapped with the prompt embeddings, and retrieve the corresponding prompt embedding of the training instance closest to the query instance during inference. While only adding 0.007% additional parameters, retrieval of soft prompt enhances the performance of T0 on unseen tasks by outperforming it on 10 out of 11 datasets as well as improving the mean accuracy of T0 on BIG-bench benchmark by 2.39% points. Also, we report an interesting finding that retrieving source embeddings trained on similar answer choice formats is more important than those on similar task types.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > China > Hong Kong (0.04)
- (13 more...)
It Costs Just $400 to Build an AI Disinformation Machine
In May, Sputnik International, a state-owned Russian media outlet, posted a series of tweets lambasting US foreign policy and attacking the Biden administration. Each prompted a curt but well-crafted rebuttal from an account called CounterCloud, sometimes including a link to a relevant news or opinion article. It generated similar responses to tweets by the Russian embassy and Chinese news outlets criticizing the US. Russian criticism of the US is far from unusual, but CounterCloud's material pushing back was: The tweets, the articles, and even the journalists and news sites were crafted entirely by artificial intelligence algorithms, according to the person behind the project, who goes by the name Nea Paw and says it is designed to highlight the danger of mass-produced AI disinformation. Paw did not post the CounterCloud tweets and articles publicly but provided them to WIRED and also produced a video outlining the project.
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Mechanical Evidence for the Phylogenetic Origin of the Red Panda's False Thumb as an Adaptation to Arboreal Locomotion
Barnett, Braden, Lyu, Yiqi, Pichney, Kyle, Sun, Brian, Wu, Jixiao
We constructed a modular, biomimetic red panda paw with which to experimentally investigate the evolutionary reason for the existence of the false thumbs of red pandas. These thumbs were once believed to have shared a common origin with the similar false thumbs of giant pandas; however, the discovery of a carnivorous fossil ancestor of the red panda that had false thumbs implies that the red panda did not evolve its thumbs to assist in eating bamboo, as the giant panda did, but rather evolved its thumbs for some other purpose. The leading proposal for this purpose is that the thumbs developed to aid arboreal locomotion. To test this hypothesis, we conducted grasp tests on rods 5-15 mm in diameter using a biomimetic paw with 0-16 mm interchangeable thumb lengths. The results of these tests demonstrated an optimal thumb length of 7 mm, which is just above that of the red panda's true thumb length of 5.5 mm. Given trends in the data that suggest that smaller thumbs are better suited to grasping larger diameter rods, we conclude that the red panda's thumb being sized below the optimum length suggests an adaptation toward grasping branches as opposed to relatively thinner food items, supporting the new proposal that the red panda's thumbs are an adaptation primary to climbing rather than food manipulation.
GAPX: Generalized Autoregressive Paraphrase-Identification X
Zhou, Yifei, Li, Renyu, Housen, Hayden, Lim, Ser-Nam
Paraphrase Identification is a fundamental task in Natural Language Processing. While much progress has been made in the field, the performance of many state-of-the-art models often suffer from distribution shift during inference time. We verify that a major source of this performance drop comes from biases introduced by negative examples. To overcome these biases, we propose in this paper to train two separate models, one that only utilizes the positive pairs and the other the negative pairs. This enables us the option of deciding how much to utilize the negative model, for which we introduce a perplexity based out-of-distribution metric that we show can effectively and automatically determine how much weight it should be given during inference. We support our findings with strong empirical results.
- Asia > India (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Sweden > Uppsala County > Uppsala (0.04)
- (3 more...)
Computer Conservation: Lily Xu Uses Artificial Intelligence To Stop Poaching Around the World
Lily Xu knew from a young age how much the environment and conservation mattered to her. By 9 years old, she'd already decided to eat vegetarian because, as she put it, "I didn't want to hurt animals." Xu grew up believing her passions would always be separate from her professional interest in computer science. Then she became a graduate student in Milind Tambe's Teamcore Lab, and everything changed. Xu is now doing award-winning research into using machine learning and artificial intelligence to help conservation and anti-poaching efforts around the world.
- Asia > Cambodia (0.08)
- North America > United States > Rhode Island (0.05)
- North America > United States > Maryland (0.05)
- North America > United States > District of Columbia > Washington (0.05)
AI for Social Good
Artificial Intelligence (AI) for social good is a field of work which, broadly speaking, uses AI to make the world a better place. I had a chance to interview two leaders in the field, Dr. Bryan Wilder, who recently received his Ph.D. from Harvard (and will be joining the faculty at Carnegie Mellon next fall) and current Harvard Ph.D. student, Lily Xu. Both Bryan and Lily have been advised by Dr. Tambe, Gordon McKay Professor of Computer Science and Director of the Center for Research in Computation and Society (CRCS) at Harvard University and Director of AI for Social Good at Google Research India. While Bryan and Lily are both working at the intersection of AI and social good, they arrived at this junction via different paths. Bryan was studying computer science and looking for a field to apply his knowledge; his search led him to public health.
- Social Sector (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.35)
- Health & Medicine > Therapeutic Area > Immunology (0.35)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.32)
Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples
Assran, Mahmoud, Caron, Mathilde, Misra, Ishan, Bojanowski, Piotr, Joulin, Armand, Ballas, Nicolas, Rabbat, Michael
This paper proposes a novel method of learning by predicting view assignments with support samples (PAWS). The method trains a model to minimize a consistency loss, which ensures that different views of the same unlabeled instance are assigned similar pseudo-labels. The pseudo-labels are generated non-parametrically, by comparing the representations of the image views to those of a set of randomly sampled labeled images. The distance between the view representations and labeled representations is used to provide a weighting over class labels, which we interpret as a soft pseudo-label. By non-parametrically incorporating labeled samples in this way, PAWS extends the distance-metric loss used in self-supervised methods such as BYOL and SwAV to the semi-supervised setting. Despite the simplicity of the approach, PAWS outperforms other semi-supervised methods across architectures, setting a new state-of-the-art for a ResNet-50 on ImageNet trained with either 10% or 1% of the labels, reaching 75.5% and 66.5% top-1 respectively. PAWS requires 4x to 12x less training than the previous best methods.