AITopics | Nasery, Anshul

Collaborating Authors

Nasery, Anshul

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Scalable Fingerprinting of Large Language Models

Nasery, Anshul, Hayase, Jonathan, Brooks, Creston, Sheng, Peiyao, Tyagi, Himanshu, Viswanath, Pramod, Oh, Sewoong

arXiv.org Artificial IntelligenceFeb-11-2025

In typical use-cases, existing methods focus on Harmlessness and Persistence (Xu et al., 2024a; Russinovich & Model fingerprinting has emerged as a powerful Salem, 2024) of fingerprints. Fingerprinting is Harmless if tool for model owners to identify their shared the utility of the fingerprinted model does not degrade from model given API access. However, to lower false the base model, and it is Persistent if performing supervised discovery rate, fight fingerprint leakage, and defend fine-tuning (SFT) on the fingerprinted model with other data against coalitions of model users attempting does not make model forget the fingerprints (Jagielski et al., to bypass detection, we argue that scalability is 2023; Chen et al., 2024). While these properties are important, critical, i.e., scaling up the number of fingerprints we argue that there is another important criterion for one can embed into a model. Hence, we pose a good fingerprinting scheme not captured by prior work: scalability as a crucial requirement for fingerprinting Scalability. A fingerprinting scheme is scalable if many schemes. We experiment with fingerprint design fingerprints can be added without hurting the performance at a scale significantly larger than previously of the model.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.0776

Country:

Asia > Thailand (0.14)
Asia > Japan (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

OML: Open, Monetizable, and Loyal AI

Cheng, Zerui, Contente, Edoardo, Finch, Ben, Golev, Oleg, Hayase, Jonathan, Miller, Andrew, Moshrefi, Niusha, Nasery, Anshul, Nailwal, Sandeep, Oh, Sewoong, Tyagi, Himanshu, Viswanath, Pramod

arXiv.org Artificial IntelligenceNov-13-2024

Artificial Intelligence (AI) has steadily improved across a wide range of tasks. However, the development and deployment of AI are almost entirely controlled by a few powerful organizations that are racing to create Artificial General Intelligence (AGI). The centralized entities make decisions with little public oversight, shaping the future of humanity, often with unforeseen consequences. In this paper, we propose OML, which stands for Open, Monetizable, and Loyal AI, an approach designed to democratize AI development. OML is realized through an interdisciplinary framework spanning AI, blockchain, and cryptography. We present several ideas for constructing OML using technologies such as Trusted Execution Environments (TEE), traditional cryptographic primitives like fully homomorphic encryption and functional encryption, obfuscation, and AI-native solutions rooted in the sample complexity and intrinsic hardness of AI tasks. A key innovation of our work is introducing a new scientific field: AI-native cryptography. Unlike conventional cryptography, which focuses on discrete data and binary security guarantees, AI-native cryptography exploits the continuous nature of AI data representations and their low-dimensional manifolds, focusing on improving approximate performance. One core idea is to transform AI attack methods, such as data poisoning, into security tools. This novel approach serves as a foundation for OML 1.0 which uses model fingerprinting to protect the integrity and ownership of AI models. The spirit of OML is to establish a decentralized, open, and transparent platform for AI development, enabling the community to contribute, monetize, and take ownership of AI models. By decentralizing control and ensuring transparency through blockchain technology, OML prevents the concentration of power and provides accountability in AI development that has not been possible before.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2411.03887

Country: North America > United States > Illinois (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PLeaS -- Merging Models with Permutations and Least Squares

Nasery, Anshul, Hayase, Jonathan, Koh, Pang Wei, Oh, Sewoong

arXiv.org Artificial IntelligenceJul-2-2024

The democratization of machine learning systems has made the process of fine-tuning accessible to a large number of practitioners, leading to a wide range of open-source models fine-tuned on specialized tasks and datasets. Recent work has proposed to merge such models to combine their functionalities. However, prior approaches are restricted to models that are fine-tuned from the same base model. Furthermore, the final merged model is typically restricted to be of the same size as the original models. In this work, we propose a new two-step algorithm to merge models-termed PLeaS-which relaxes these constraints. First, leveraging the Permutation symmetries inherent in the two models, PLeaS partially matches nodes in each layer by maximizing alignment. Next, PLeaS computes the weights of the merged model as a layer-wise Least Squares solution to minimize the approximation error between the features of the merged model and the permuted features of the original models. into a single model of a desired size, even when the two original models are fine-tuned from different base models. We also present a variant of our method which can merge models without using data from the fine-tuning domains. We demonstrate our method to merge ResNet models trained with shared and different label spaces, and show that we can perform better than the state-of-the-art merging methods by 8 to 15 percentage points for the same target compute while merging models trained on DomainNet and on fine-grained classification tasks.

artificial intelligence, machine learning, pleas, (20 more...)

arXiv.org Artificial Intelligence

2407.02447

Country: North America > United States > Colorado (0.14)

Genre: Research Report > Promising Solution (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

PEEKABOO: Interactive Video Generation via Masked-Diffusion

Jain, Yash, Nasery, Anshul, Vineet, Vibhav, Behl, Harkirat

arXiv.org Artificial IntelligenceDec-12-2023

Recently there has been a lot of progress in text-to-video generation, with state-of-the-art models being capable of generating high quality, realistic videos. However, these models lack the capability for users to interactively control and generate videos, which can potentially unlock new areas of application. As a first step towards this goal, we tackle the problem of endowing diffusion-based video generation models with interactive spatio-temporal control over their output. To this end, we take inspiration from the recent advances in segmentation literature to propose a novel spatio-temporal masked attention module - Peekaboo. This module is a training-free, no-inference-overhead addition to off-the-shelf video generation models which enables spatio-temporal control. We also propose an evaluation benchmark for the interactive video generation task. Through extensive qualitative and quantitative evaluation, we establish that Peekaboo enables control video generation and even obtains a gain of upto 3.8x in mIoU over baseline models.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2312.07509

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Label Differential Privacy via Aggregation

Brahmbhatt, Anand, Saket, Rishi, Havaldar, Shreyas, Nasery, Anshul, Raghuveer, Aravindan

arXiv.org Machine LearningNov-27-2023

In many real-world applications, due to recent developments in the privacy landscape, training data may be aggregated to preserve the privacy of sensitive training labels. In the learning from label proportions (LLP) framework, the dataset is partitioned into bags of feature-vectors which are available only with the sum of the labels per bag. A further restriction, which we call learning from bag aggregates (LBA) is where instead of individual feature-vectors, only the (possibly weighted) sum of the feature-vectors per bag is available. We study whether such aggregation techniques can provide privacy guarantees under the notion of label differential privacy (label-DP) previously studied in for e.g. [Chaudhuri-Hsu'11, Ghazi et al.'21, Esfandiari et al.'22]. It is easily seen that naive LBA and LLP do not provide label-DP. Our main result however, shows that weighted LBA using iid Gaussian weights with $m$ randomly sampled disjoint $k$-sized bags is in fact $(\varepsilon, \delta)$-label-DP for any $\varepsilon > 0$ with $\delta \approx \exp(-\Omega(\sqrt{k}))$ assuming a lower bound on the linear-mse regression loss. Further, the $\ell_2^2$-regressor which minimizes the loss on the aggregated dataset has a loss within $\left(1 + o(1)\right)$-factor of the optimum on the original dataset w.p. $\approx 1 - exp(-\Omega(m))$. We emphasize that no additive label noise is required. The analogous weighted-LLP does not however admit label-DP. Nevertheless, we show that if additive $N(0, 1)$ noise can be added to any constant fraction of the instance labels, then the noisy weighted-LLP admits similar label-DP guarantees without assumptions on the dataset, while preserving the utility of Lipschitz-bounded neural mse-regression tasks. Our work is the first to demonstrate that label-DP can be achieved by randomly weighted aggregation for regression tasks, using no or little additive noise.

artificial intelligence, machine learning, null 2, (17 more...)

arXiv.org Machine Learning

2310.10092

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates

Nasery, Anshul, Shah, Hardik, Suggala, Arun Sai, Jain, Prateek

arXiv.org Artificial IntelligenceJun-13-2023

Neural network (NN) compression via techniques such as pruning, quantization requires setting compression hyperparameters (e.g., number of channels to be pruned, bitwidths for quantization) for each layer either manually or via neural architecture search (NAS) which can be computationally expensive. We address this problem by providing an end-to-end technique that optimizes for model's Floating Point Operations (FLOPs) or for on-device latency via a novel $\frac{\ell_1}{\ell_2}$ latency surrogate. Our algorithm is versatile and can be used with many popular compression methods including pruning, low-rank factorization, and quantization. Crucially, it is fast and runs in almost the same amount of time as single model training; which is a significant training speed-up over standard NAS methods. For BERT compression on GLUE fine-tuning tasks, we achieve $50\%$ reduction in FLOPs with only $1\%$ drop in performance. For compressing MobileNetV3 on ImageNet-1K, we achieve $15\%$ reduction in FLOPs, and $11\%$ reduction in on-device latency without drop in accuracy, while still requiring $3\times$ less training compute than SOTA compression techniques. Finally, for transfer learning on smaller datasets, our technique identifies $1.2\times$-$1.4\times$ cheaper architectures than standard MobileNetV3, EfficientNet suite of architectures at almost the same training cost and accuracy.

artificial intelligence, machine learning, regularizer, (17 more...)

arXiv.org Artificial Intelligence

2306.05785

Country: Europe (0.46)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Training for the Future: A Simple Gradient Interpolation Loss to Generalize Along Time

Nasery, Anshul, Thakur, Soumyadeep, Piratla, Vihari, De, Abir, Sarawagi, Sunita

arXiv.org Machine LearningAug-15-2021

In several real world applications, machine learning models are deployed to make predictions on data whose distribution changes gradually along time, leading to a drift between the train and test distributions. Such models are often re-trained on new data periodically, and they hence need to generalize to data not too far into the future. In this context, there is much prior work on enhancing temporal generalization, e.g. continuous transportation of past data, kernel smoothed time-sensitive parameters and more recently, adversarial learning of time-invariant features. However, these methods share several limitations, e.g, poor scalability, training instability, and dependence on unlabeled data from the future. Responding to the above limitations, we propose a simple method that starts with a model with time-sensitive parameters but regularizes its temporal complexity using a Gradient Interpolation (GI) loss. GI allows the decision boundary to change along time and can still prevent overfitting to the limited training time snapshots by allowing task-specific control over changes along time. We compare our method to existing baselines on multiple real-world datasets, which show that GI outperforms more complicated generative and adversarial approaches on the one hand, and simpler gradient regularization methods on the other.

artificial intelligence, dataset, machine learning, (13 more...)

arXiv.org Machine Learning

2108.06721

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)

Add feedback

What if Neural Networks had SVDs?

Mathiasen, Alexander, Hvilshøj, Frederik, Jørgensen, Jakob Rødsgaard, Nasery, Anshul, Mottin, Davide

arXiv.org Machine LearningSep-29-2020

Various Neural Networks employ time-consuming matrix operations like matrix inversion. Many such matrix operations are faster to compute given the Singular Value Decomposition (SVD). Previous work allows using the SVD in Neural Networks without computing it. In theory, the techniques can speed up matrix operations, however, in practice, they are not fast enough. We present an algorithm that is fast enough to speed up several matrix operations. The algorithm increases the degree of parallelism of an underlying matrix multiplication $H\cdot X$ where $H$ is an orthogonal matrix represented by a product of Householder matrices. Code is available at www.github.com/AlexanderMath/fasth .

algorithm, artificial intelligence, neural network, (15 more...)

arXiv.org Machine Learning

2009.13977

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

CognitiveCNN: Mimicking Human Cognitive Models to resolve Texture-Shape Bias

Mohla, Satyam, Nasery, Anshul, Banerjee, Biplab, Chaudhari, Subhasis

arXiv.org Artificial IntelligenceJun-25-2020

Recent works demonstrate the texture bias in Convolutional Neural Networks (CNNs), conflicting with early works claiming that networks identify objects using shape. It is commonly believed that the cost function forces the network to take a greedy route to increase accuracy using texture, failing to explore any global statistics. We propose a novel intuitive architecture, namely CognitiveCNN, inspired from feature integration theory in psychology to utilise human-interpretable feature like shape, texture, edges etc. to reconstruct, and classify the image. We define two metrics, namely TIC and RIC to quantify the importance of each stream using attention maps. We introduce a regulariser which ensures that the contribution of each feature is same for any task, as it is for reconstruction; and perform experiments to show the resulting boost in accuracy and robustness besides imparting explainability. Lastly, we adapt these ideas to conventional CNNs and propose Augmented Cognitive CNN to achieve superior performance in object recognition.

accuracy, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

2006.14722

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback