AITopics | Willmott, Devin

Plotting

Willmott, Devin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HyperCLIP: Adapting Vision-Language models with Hypernetworks

Akinwande, Victor, Norouzzadeh, Mohammad Sadegh, Willmott, Devin, Bair, Anna, Ganesh, Madan Ravi, Kolter, J. Zico

arXiv.org Artificial IntelligenceDec-21-2024

Self-supervised vision-language models trained with contrastive objectives form the basis of current state-of-the-art methods in AI vision tasks. The success of these models is a direct consequence of the huge web-scale datasets used to train them, but they require correspondingly large vision components to properly learn powerful and general representations from such a broad data domain. This poses a challenge for deploying large vision-language models, especially in resourceconstrained environments. To address this, we propose an alternate vision-language architecture, called HyperCLIP, that uses a small image encoder along with a hypernetwork that dynamically adapts image encoder weights to each new set of text inputs. All three components of the model (hypernetwork, image encoder, and text encoder) are pre-trained jointly end-to-end, and with a trained HyperCLIP model, we can generate new zero-shot deployment-friendly image classifiers for any task with a single forward pass through the text encoder and hypernetwork. HyperCLIP increases the zero-shot accuracy of SigLIP trained models with small image encoders by up to 3% on ImageNet and 5% on CIFAR-100 with minimal training throughput overhead. A now-standard approach in deep learning for vision tasks is to first pre-train a model on web-scale data and then adapt this model for a specific task using little or no additional data. Despite the widespread success of these models and their lack of reliance on large-scale labeled datasets, a significant downside is that these models are often on the order of billions of parameters - much larger than their supervised counterparts for a given task at the same accuracy level. While these pre-trained models are powerful due to their generality, practitioners still need to apply them to well defined and specific tasks. We consider settings where there are additional constraints on the size of these models such as in edge computing applications.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.16777

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging Foundation Models to Improve Lightweight Clients in Federated Learning

Wu, Xidong, Lin, Wan-Yi, Willmott, Devin, Condessa, Filipe, Huang, Yufei, Li, Zhenzhen, Ganesh, Madan Ravi

arXiv.org Artificial IntelligenceNov-14-2023

Federated Learning (FL) is a distributed training paradigm that enables clients scattered across the world to cooperatively learn a global model without divulging confidential data. However, FL faces a significant challenge in the form of heterogeneous data distributions among clients, which leads to a reduction in performance and robustness. A recent approach to mitigating the impact of heterogeneous data distributions is through the use of foundation models, which offer better performance at the cost of larger computational overheads and slower inference speeds. We introduce foundation model distillation to assist in the federated training of lightweight client models and increase their performance under heterogeneous data settings while keeping inference costs low. Our results show improvement in the global model performance on a balanced testing set, which contains rarely observed samples, even under extreme non-IID client data distributions. We conduct a thorough evaluation of our framework with different foundation model backbones on CIFAR10, with varying degrees of heterogeneous data distributions ranging from class-specific data partitions across clients to dirichlet data sampling, parameterized by values between 0.01 and 1.0.

artificial intelligence, foundation model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2311.08479

Country: North America > United States (0.15)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Hard Label Black-box Adversarial Attacks in Low Query Budget Regimes

Shukla, Satya Narayan, Sahu, Anit Kumar, Willmott, Devin, Kolter, J. Zico

arXiv.org Machine LearningJul-13-2020

We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples for deep learning models solely based on information limited to output labels (hard label) to a queried data input. We use Bayesian optimization (BO) to specifically cater to scenarios involving low query budgets to develop efficient adversarial attacks. Issues with BO's performance in high dimensions are avoided by searching for adversarial examples in structured low-dimensional subspace. Our proposed approach achieves better performance to state of the art black-box adversarial attacks that require orders of magnitude more queries than ours.

air transportation, deep learning, perturbation, (20 more...)

arXiv.org Machine Learning

2007.0721

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Black-box Adversarial Attacks with Bayesian Optimization

Shukla, Satya Narayan, Sahu, Anit Kumar, Willmott, Devin, Kolter, J. Zico

arXiv.org Machine LearningSep-30-2019

October 1, 2019 Abstract We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization (BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high dimensional deep learning models by effective dimension upsampling techniques. Our proposed approach achieves performance comparable to the state of the art black-box adversarial attacks albeit with a much lower average query count. In particular, in low query budget regimes, our proposed method reduces the query count up to 80% with respect to the state of the art methods. 1 Introduction Neural networks are now well-known to be vulnerable to adversarial examples: additive perturbations that, when applied to the input, change the network's output classification [9]. Work investigating this lack of robustness to adversarial examples often takes the form of a back-and-forth between newly proposed adversarial attacks, methods for quickly and efficiently crafting adversarial examples, and corresponding defenses that modify the classifier at either training or test time to improve robustness. The most successful adversarial attacks use gradient-based optimization methods [9, 17], which require complete knowledge of the architecture and parameters of the target network; this assumption is referred to as the white-box attack setting.

deep learning, neural network, perturbation, (17 more...)

arXiv.org Machine Learning

1909.13857

Country:

North America > United States > Massachusetts (0.14)
North America > United States > Hawaii (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.84)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Orthogonal Recurrent Neural Networks with Scaled Cayley Transform

Helfrich, Kyle, Willmott, Devin, Ye, Qiang

arXiv.org Machine LearningNov-14-2017

Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices. This is done by parametrizing with a skew-symmetric matrix using the Cayley transform. Such a parametrization is unable to represent matrices with negative one eigenvalues, but this limitation is overcome by scaling the recurrent weight matrix by a diagonal matrix consisting of ones and negative ones. The proposed training scheme involves a straightforward gradient calculation and update step. In several experiments, the proposed scaled Cayley orthogonal recurrent neural network (scoRNN) achieves superior results with fewer trainable parameters than other unitary RNNs.

deep learning, matrix, neural network, (17 more...)

arXiv.org Machine Learning

1707.0952

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Kentucky > Fayette County > Lexington (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback