AITopics | Weller, Adrian

Plotting

Weller, Adrian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diffused Redundancy in Pre-trained Representations

Nanda, Vedant, Speicher, Till, Dickerson, John P., Feizi, Soheil, Gummadi, Krishna P., Weller, Adrian

arXiv.org Artificial IntelligenceNov-14-2023

Representations learned by pre-training a neural network on a large dataset are increasingly used successfully to perform a variety of downstream tasks. In this work, we take a closer look at how features are encoded in such pre-trained representations. We find that learned representations in a given layer exhibit a degree of diffuse redundancy, ie, any randomly chosen subset of neurons in the layer that is larger than a threshold size shares a large degree of similarity with the full layer and is able to perform similarly as the whole layer on a variety of downstream tasks. For example, a linear probe trained on $20\%$ of randomly picked neurons from the penultimate layer of a ResNet50 pre-trained on ImageNet1k achieves an accuracy within $5\%$ of a linear probe trained on the full layer of neurons for downstream CIFAR10 classification. We conduct experiments on different neural architectures (including CNNs and Transformers) pre-trained on both ImageNet1k and ImageNet21k and evaluate a variety of downstream tasks taken from the VTAB benchmark. We find that the loss and dataset used during pre-training largely govern the degree of diffuse redundancy and the "critical mass" of neurons needed often depends on the downstream task, suggesting that there is a task-inherent redundancy-performance Pareto frontier. Our findings shed light on the nature of representations learned by pre-trained deep neural networks and suggest that entire layers might not be necessary to perform many downstream tasks. We investigate the potential for exploiting this redundancy to achieve efficient generalization for downstream tasks and also draw caution to certain possible unintended consequences. Our code is available at \url{https://github.com/nvedant07/diffused-redundancy}.

artificial intelligence, machine learning, redundancy, (16 more...)

arXiv.org Artificial Intelligence

2306.00183

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.65)

Industry:

Health & Medicine (1.00)
Transportation (0.67)
Government > Regional Government > North America Government > United States Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

Liu, Weiyang, Qiu, Zeju, Feng, Yao, Xiu, Yuliang, Xue, Yuxuan, Yu, Longhui, Feng, Haiwen, Liu, Zhen, Heo, Juyeon, Peng, Songyou, Wen, Yandong, Black, Michael J., Weller, Adrian, Schölkopf, Bernhard

arXiv.org Artificial IntelligenceNov-10-2023

Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2311.06243

Country:

Europe (0.92)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Evaluating Language Models for Mathematics through Interactions

Collins, Katherine M., Jiang, Albert Q., Frieder, Simon, Wong, Lionel, Zilka, Miri, Bhatt, Umang, Lukasiewicz, Thomas, Wu, Yuhuai, Tenenbaum, Joshua B., Hart, William, Gowers, Timothy, Li, Wenda, Weller, Adrian, Jamnik, Mateja

arXiv.org Artificial IntelligenceNov-5-2023

There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants. However, the standard methodology of evaluating LLMs relies on static pairs of inputs and outputs, and is insufficient for making an informed decision about which LLMs and under which assistive settings can they be sensibly used. Static assessment fails to account for the essential interactive element in LLM deployment, and therefore limits how we understand language model capabilities. We introduce CheckMate, an adaptable prototype platform for humans to interact with and evaluate LLMs. We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics, with a mixed cohort of participants from undergraduate students to professors of mathematics. We release the resulting interaction and rating dataset, MathConverse. By analysing MathConverse, we derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness in LLM generations, amongst other findings. Further, we garner a more granular understanding of GPT-4 mathematical problem-solving through a series of case studies, contributed by expert mathematicians. We conclude with actionable takeaways for ML practitioners and mathematicians: models that communicate uncertainty respond well to user corrections, and are more interpretable and concise may constitute better assistants. Interactive evaluation is a promising way to navigate the capability of these models; humans should be aware of language models' algebraic fallibility and discern where they are appropriate to use.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2306.01694

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
North America > Canada > Ontario > Toronto (0.13)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.13)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.45)
Personal > Interview (0.45)
Research Report > Experimental Study (0.45)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Getting aligned on representational alignment

Sucholutsky, Ilia, Muttenthaler, Lukas, Weller, Adrian, Peng, Andi, Bobu, Andreea, Kim, Been, Love, Bradley C., Grant, Erin, Groen, Iris, Achterberg, Jascha, Tenenbaum, Joshua B., Collins, Katherine M., Hermann, Katherine L., Oktar, Kerem, Greff, Klaus, Hebart, Martin N., Jacoby, Nori, Zhang, Qiuyi, Marjieh, Raja, Geirhos, Robert, Chen, Sherol, Kornblith, Simon, Rane, Sunayana, Konkle, Talia, O'Connell, Thomas P., Unterthiner, Thomas, Lampinen, Andrew K., Müller, Klaus-Robert, Toneva, Mariya, Griffiths, Thomas L.

arXiv.org Artificial IntelligenceNov-2-2023

Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the extent to which the representations formed by these diverse systems agree? Do similarities in representations then translate into similar behavior? How can a system's representations be modified to better match those of another system? These questions pertaining to the study of representational alignment are at the heart of some of the most active research areas in cognitive science, neuroscience, and machine learning. For example, cognitive scientists measure the representational alignment of multiple individuals to identify shared cognitive priors, neuroscientists align fMRI responses from multiple individuals into a shared representational space for group-level analyses, and ML researchers distill knowledge from teacher models into student models by increasing their alignment. Unfortunately, there is limited knowledge transfer between research communities interested in representational alignment, so progress in one field often ends up being rediscovered independently in another. Thus, greater cross-field communication would be advantageous. To improve communication between these fields, we propose a unifying framework that can serve as a common language between researchers studying representational alignment. We survey the literature from all three fields and demonstrate how prior work fits into this framework. Finally, we lay out open problems in representational alignment where progress can benefit all three of these fields. We hope that our work can catalyze cross-disciplinary collaboration and accelerate progress for all communities studying and developing information processing systems. We note that this is a working paper and encourage readers to reach out with their suggestions for future revisions.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2310.13018

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Texas (0.14)
North America > United States > Pennsylvania (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback

Controlling Text-to-Image Diffusion by Orthogonal Finetuning

Qiu, Zeju, Liu, Weiyang, Feng, Haiwen, Xue, Yuxuan, Feng, Yao, Liu, Zhen, Zhang, Dan, Weller, Adrian, Schölkopf, Bernhard

arXiv.org Artificial IntelligenceOct-26-2023

Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.

artificial intelligence, machine learning, text prompt, (17 more...)

arXiv.org Artificial Intelligence

2306.0728

Country:

Europe > Germany (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Amortised Inference in Neural Networks for Small-Scale Probabilistic Meta-Learning

Ashman, Matthew, Rochussen, Tommy, Weller, Adrian

arXiv.org Machine LearningOct-24-2023

In many machine learning applications, well-calibrated posterior predictive distributions are required for a number of closely-related datasets. Given similarity between datasets, it is natural to wish to develop meta-learning algorithms that utilise other datasets to reduce the computational complexity and / or improve predictive performance when deploying models on newly-seen datasets at test time. There have been a number of significant recent developments in meta-learning for predictive distributions, most notably that of the neural process (NP) family (Garnelo et al., 2018a,b; Foong et al., 2020; Gordon et al., 2018, 2019). Despite the utility of these methods on large-scale meta-datasets, they perform poorly in settings where the number of datasets and the total number of datapoints is small. We argue that this is a result of the large number of shared model parameters overfitting to the meta-dataset.

approximation, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2310.15786

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Add feedback

AI for Mathematics: A Cognitive Science Perspective

Zhang, Cedegao E., Collins, Katherine M., Weller, Adrian, Tenenbaum, Joshua B.

arXiv.org Artificial IntelligenceOct-18-2023

Mathematics is one of the most powerful conceptual systems developed and used by the human species. Dreams of automated mathematicians have a storied history in artificial intelligence (AI). Rapid progress in AI, particularly propelled by advances in large language models (LLMs), has sparked renewed, widespread interest in building such systems. In this work, we reflect on these goals from a \textit{cognitive science} perspective. We call attention to several classical and ongoing research directions from cognitive science, which we believe are valuable for AI practitioners to consider when seeking to build truly human (or superhuman)-level mathematical systems. We close with open discussions and questions that we believe necessitate a multi-disciplinary perspective -- cognitive scientists working in tandem with AI researchers and mathematicians -- as we move toward better mathematical AI systems which not only help us push the frontier of the mathematics, but also offer glimpses into how we as humans are even capable of such great cognitive feats.

cognitive science perspective, large language model, natural language, (2 more...)

arXiv.org Artificial Intelligence

2310.13021

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Cognitive Architectures (0.80)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

Pairwise Similarity Learning is SimPLE

Wen, Yandong, Liu, Weiyang, Feng, Yao, Raj, Bhiksha, Singh, Rita, Weller, Adrian, Black, Michael J., Schölkopf, Bernhard

arXiv.org Artificial IntelligenceOct-13-2023

In this paper, we focus on a general yet important learning problem, pairwise similarity learning (PSL). PSL subsumes a wide range of important applications, such as open-set face recognition, speaker verification, image retrieval and person re-identification. The goal of PSL is to learn a pairwise similarity function assigning a higher similarity score to positive pairs (i.e., a pair of samples with the same label) than to negative pairs (i.e., a pair of samples with different label). We start by identifying a key desideratum for PSL, and then discuss how existing methods can achieve this desideratum. We then propose a surprisingly simple proxy-free method, called SimPLE, which requires neither feature/proxy normalization nor angular margin and yet is able to generalize well in open-set recognition. We apply the proposed method to three challenging PSL tasks: open-set face recognition, image retrieval and speaker verification. Comprehensive experimental results on large-scale benchmarks show that our method performs significantly better than current state-of-the-art methods.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2310.09449

Country:

Europe (0.14)
Asia (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.46)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Universal Graph Random Features

Reid, Isaac, Choromanski, Krzysztof, Berger, Eli, Weller, Adrian

arXiv.org Machine LearningOct-10-2023

We propose a novel random walk-based algorithm for unbiased estimation of arbitrary functions of a weighted adjacency matrix, coined universal graph random features (u-GRFs). This includes many of the most popular examples of kernels defined on the nodes of a graph. Our algorithm enjoys subquadratic time complexity with respect to the number of nodes, overcoming the notoriously prohibitive cubic scaling of exact graph kernel evaluation. It can also be trivially distributed across machines, permitting learning on much larger networks. At the heart of the algorithm is a modulation function which upweights or downweights the contribution from different random walks depending on their lengths. We show that by parameterising it with a neural network we can obtain u-GRFs that give higher-quality kernel estimates or perform efficient, scalable kernel learning. We provide robust theoretical analysis and support our findings with experiments including pointwise estimation of fixed graph kernels, solving non-homogeneous graph ordinary differential equations, node clustering and kernel regression on triangular meshes.

artificial intelligence, kernel, machine learning, (17 more...)

arXiv.org Machine Learning

2310.04859

Country:

Oceania > Australia > New South Wales (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Yu, Longhui, Jiang, Weisen, Shi, Han, Yu, Jincheng, Liu, Zhengying, Zhang, Yu, Kwok, James T., Li, Zhenguo, Weller, Adrian, Liu, Weiyang

arXiv.org Artificial IntelligenceOct-9-2023

Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (e.g., LLaMA-2) are still far away from satisfactory for solving mathematical problems due to the complex reasoning procedures. To bridge this gap, we propose MetaMath, a finetuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives, which results in a new dataset called MetaMathQA. Experimental results on two popular benchmarks (i.e., GSM8K and MATH) for mathematical reasoning demonstrate that MetaMath outperforms a suite of open-source LLMs by a significant margin. Our MetaMath-7B model achieves 66.5% on GSM8K and 19.8% on MATH, exceeding the state-ofthe-art models of the same size by 11.5% and 8.7%. Particularly, MetaMath-70B achieves an accuracy of 82.3% on GSM8K, slightly better than GPT-3.5-Turbo. We release all the MetaMathQA dataset, the MetaMath models with different model sizes and the training code for public use. What is the total amount that James paid when he purchased 5 packs of beef, each weighing 4 pounds, at a price of $5.50 per pound? James buys x packs of beef that are 4 packs of beef that are 4 pounds each. The price of beef is $5.50 per pound. What is The price of beef is $5.50 per pound. James buys x packs of beef that are 4 pounds each.

large language model, machine learning, preprint arxiv, (19 more...)

arXiv.org Artificial Intelligence

2309.12284

Country: Europe (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback