AITopics | Masrani, Vaden

Collaborating Authors

Masrani, Vaden

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Task-Agnostic Language Model Watermarking via High Entropy Passthrough Layers

Masrani, Vaden, Akbari, Mohammad, Yue, David Ming Xuan, Rezaei, Ahmad, Zhang, Yong

arXiv.org Artificial IntelligenceDec-17-2024

In the era of costly pre-training of large language models, ensuring the intellectual property rights of model owners, and insuring that said models are responsibly deployed, is becoming increasingly important. To this end, we propose model watermarking via passthrough layers, which are added to existing pre-trained networks and trained using a self-supervised loss such that the model produces high-entropy output when prompted with a unique private key, and acts normally otherwise. Unlike existing model watermarking methods, our method is fully task-agnostic, and can be applied to both classification and sequence-to-sequence tasks without requiring advanced access to downstream fine-tuning datasets. We evaluate the proposed passthrough layers on a wide range of downstream tasks, and show experimentally our watermarking method achieves a near-perfect watermark extraction accuracy and false-positive rate in most cases without damaging original model performance. Additionally, we show our method is robust to both downstream fine-tuning, fine-pruning, and layer removal attacks, and can be trained in a fraction of the time required to train the original model. Code is available in the paper.

machine learning, natural language, passthrough layer, (19 more...)

arXiv.org Artificial Intelligence

2412.12563

Country: North America > United States (0.46)

Genre: Research Report > Promising Solution (0.54)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation

Gholami, Mohsen, Akbari, Mohammad, Hu, Cindy, Masrani, Vaden, Wang, Z. Jane, Zhang, Yong

arXiv.org Artificial IntelligenceMar-28-2024

Knowledge distillation from LLMs is essential for the efficient deployment of language models. Prior works have proposed data generation using LLMs for preparing distilled models. We argue that generating data with LLMs is prone to sampling mainly from the center of original content distribution. This limitation hinders the distilled model from learning the true underlying data distribution and to forget the tails of the distributions (samples with lower probability). To this end, we propose GOLD, a task-agnostic data generation and knowledge distillation framework, which employs an iterative out-of-distribution-guided feedback mechanism for the LLM. As a result, the generated data improves the generalizability of distilled models. An energy-based OOD evaluation approach is also introduced to deal with noisy generated data. Our extensive experiments on 10 different classification and sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior arts and the LLM with an average improvement of 5% and 14%. We will also show that the proposed method is applicable to less explored and novel tasks. The code is available.

artificial intelligence, large language model, natural language, (20 more...)

arXiv.org Artificial Intelligence

2403.19754

Country:

Asia > Middle East > UAE (0.14)
North America > United States > Texas (0.14)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Flexible Diffusion Modeling of Long Videos

Harvey, William, Naderiparizi, Saeid, Masrani, Vaden, Weilbach, Christian, Wood, Frank

arXiv.org Artificial IntelligenceDec-15-2022

We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in a variety of realistic environments. We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset and present an architecture adapted for this purpose. Doing so allows us to efficiently compare and optimize a variety of schedules for the order in which frames in a long video are sampled and use selective sparse and long-range conditioning on previously sampled frames. We demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length. We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA autonomous driving simulator.

artificial intelligence, machine learning, video, (17 more...)

arXiv.org Artificial Intelligence

2205.11495

Country: North America > Canada (0.68)

Genre: Research Report (1.00)

Industry:

Information Technology (0.48)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)

Add feedback

Proof of the impossibility of probabilistic induction

Masrani, Vaden

arXiv.org Artificial IntelligenceJul-1-2021

In what follows I restate and simplify the proof of the impossibility of probabilistic induction given in [1,2]. Other proofs are possible (cf. Given two statements x, y we write x y if x entails y. For example, if x "All men are mortal" y "Socrates is a man" z "Socrates is mortal" we can write (x y) z. Entailment is a "truth broadcasting" operation that allows the truth of the premises to flow to the conclusion. We also have the following simple theorem.

artificial intelligence, machine learning, probabilistic induction, (12 more...)

arXiv.org Artificial Intelligence

2107.00749

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.49)
Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

q-Paths: Generalizing the Geometric Annealing Path using Power Means

Masrani, Vaden, Brekelmans, Rob, Bui, Thang, Nielsen, Frank, Galstyan, Aram, Steeg, Greg Ver, Wood, Frank

arXiv.org Artificial IntelligenceJul-1-2021

Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of closed form energy function. In this work, we introduce $q$-paths, a family of paths which is derived from a generalized notion of the mean, includes the geometric and arithmetic mixtures as special cases, and admits a simple closed form involving the deformed logarithm function from nonextensive thermodynamics. Following previous analysis of the geometric path, we interpret our $q$-paths as corresponding to a $q$-exponential family of distributions, and provide a variational representation of intermediate densities as minimizing a mixture of $\alpha$-divergences to the endpoints. We show that small deviations away from the geometric path yield empirical gains for Bayesian inference using Sequential Monte Carlo and generative model evaluation using Annealed Importance Sampling.

artificial intelligence, machine learning, q-exponential family, (14 more...)

arXiv.org Artificial Intelligence

2107.00745

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.86)

Add feedback

Differentiable Particle Filtering without Modifying the Forward Pass

Ścibior, Adam, Masrani, Vaden, Wood, Frank

arXiv.org Machine LearningJun-18-2021

In recent years particle filters have being used as components in systems optimized end-to-end with gradient descent. However, the resampling step in a particle filter is not differentiable, which biases gradients and interferes with optimization. To remedy this problem, several differentiable variants of resampling have been proposed, all of which modify the behavior of the particle filter in significant and potentially undesirable ways. In this paper, we show how to obtain unbiased estimators of the gradient of the marginal likelihood by only modifying messages used in backpropagation, leaving the standard forward pass of a particle filter unchanged. Our method is simple to implement, has a low computational overhead, does not introduce additional hyperparameters, and extends to derivatives of higher orders. We call it stop-gradient resampling, since it can easily be implemented with automatic differentiation libraries using the stop-gradient operator instead of explicitly modifying the backward messages.

artificial intelligence, differentiable particle filtering, machine learning, (2 more...)

arXiv.org Machine Learning

2106.10314

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Annealed Importance Sampling with q-Paths

Brekelmans, Rob, Masrani, Vaden, Bui, Thang, Wood, Frank, Galstyan, Aram, Steeg, Greg Ver, Nielsen, Frank

arXiv.org Artificial IntelligenceDec-14-2020

Annealed importance sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods, corresponding to importance sampling over a path of distributions between a tractable base and an unnormalized target. While AIS yields an unbiased estimator for any path, existing literature has been primarily limited to the geometric mixture or moment-averaged paths associated with the exponential family and KL divergence. We explore AIS using $q$-paths, which include the geometric path as a special case and are related to the homogeneous power mean, deformed exponential family, and $\alpha$-divergence.

artificial intelligence, exponential family, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2012.07823

Country: North America > Canada > British Columbia (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference

Brekelmans, Rob, Masrani, Vaden, Wood, Frank, Steeg, Greg Ver, Galstyan, Aram

arXiv.org Machine LearningJul-1-2020

The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a "schedule" of intermediate distributions, and model learning suffered with ostensibly tighter bounds. In this work, we propose an exponential family interpretation of the geometric mixture curve underlying the TVO and various path sampling methods, which allows us to characterize the gap in TVO likelihood bounds as a sum of KL divergences. We propose to choose intermediate distributions using equal spacing in the moment parameters of our exponential family, which matches grid search performance and allows the schedule to adaptively update over the course of training. Finally, we derive a doubly reparameterized gradient estimator which improves model learning and allows the TVO to benefit from more refined bounds. To further contextualize our contributions, we provide a unified framework for understanding thermodynamic integration and the TVO using Taylor series remainders.

deep learning, divergence, neural network, (19 more...)

arXiv.org Machine Learning

2007.00642

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Thermodynamic Variational Objective

Masrani, Vaden, Le, Tuan Anh, Wood, Frank

arXiv.org Machine LearningJun-28-2019

We introduce the thermodynamic variational objective (TVO) for learning in both continuous and discrete deep generative models. The TVO arises from a key connection between variational inference and thermodynamic integration that results in a tighter lower bound to the log marginal likelihood than the standard variational evidence lower bound (ELBO), while remaining as broadly applicable. We provide a computationally efficient gradient estimator for the TVO that applies to continuous, discrete, and non-reparameterizable distributions and show that the objective functions used in variational inference, variational autoencoders, wake sleep, and inference compilation are all special cases of the TVO. We evaluate the TVO for learning of discrete and continuous variational auto encoders, and find it achieves state of the art for learning in discrete variable models, and outperform VAEs on continuous variable models without using the reparameterization trick.

deep learning, objective, tvo, (18 more...)

arXiv.org Machine Learning

1907.00031

Country: North America > Canada > British Columbia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback