AITopics | Fifty, Christopher

Collaborating Authors

Fifty, Christopher

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Restructuring Vector Quantization with the Rotation Trick

Fifty, Christopher, Junkins, Ronald G., Duan, Dennis, Iger, Aniketh, Liu, Jerry W., Amid, Ehsan, Thrun, Sebastian, Ré, Christopher

arXiv.org Artificial IntelligenceOct-8-2024

Vector Quantized Variational AutoEncoders (VQ-VAEs) are designed to compress a continuous input to a discrete latent space and reconstruct it with minimal distortion. They operate by maintaining a set of vectors--often referred to as the codebook--and quantizing each encoder output to the nearest vector in the codebook. However, as vector quantization is non-differentiable, the gradient to the encoder flows around the vector quantization layer rather than through it in a straight-through approximation. This approximation may be undesirable as all information from the vector quantization operation is lost. In this work, we propose a way to propagate gradients through the vector quantization layer of VQ-VAEs. We smoothly transform each encoder output into its corresponding codebook vector via a rotation and rescaling linear transformation that is treated as a constant during backpropagation. As a result, the relative magnitude and angle between encoder output and codebook vector becomes encoded into the gradient as it propagates through the vector quantization layer and back to the encoder. Across 11 different VQ-VAE training paradigms, we find this restructuring improves reconstruction metrics, codebook utilization, and quantization error. Vector quantization (Gray, 1984) is an approach to discretize a continuous vector space. It defines a finite set of vectors--referred to as the codebook--and maps any vector in the continuous vector space to the closest vector in the codebook.

artificial intelligence, codebook vector, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.06424

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Context-Aware Meta-Learning

Fifty, Christopher, Duan, Dennis, Junkins, Ronald G., Amid, Ehsan, Leskovec, Jure, Ré, Christopher, Thrun, Sebastian

arXiv.org Artificial IntelligenceOct-16-2023

Large Language Models like ChatGPT demonstrate a remarkable capacity to learn new concepts during inference without any fine-tuning. However, visual models trained to detect new objects during inference have been unable to replicate this ability, and instead either perform poorly or require meta-training and/or fine-tuning on similar objects. In this work, we propose a meta-learning algorithm that emulates Large Language Models by learning new visual concepts during inference without fine-tuning. Our approach leverages a frozen pre-trained feature extractor, and analogous to in-context learning, recasts meta-learning as sequence modeling over datapoints with known labels and a test datapoint with an unknown label. On 8 out of 11 meta-learning benchmarks, our approach -- without meta-training or fine-tuning -- exceeds or matches the state-of-the-art algorithm, P>M>F, which is meta-trained on these benchmarks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.10971

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Aerospace & Defense > Aircraft (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

In-Context Learning for Few-Shot Molecular Property Prediction

Fifty, Christopher, Leskovec, Jure, Thrun, Sebastian

arXiv.org Artificial IntelligenceOct-13-2023

In-context learning has become an important approach for few-shot learning in Large Language Models because of its ability to rapidly adapt to new tasks without fine-tuning model parameters. However, it is restricted to applications in natural language and inapplicable to other domains. In this paper, we adapt the concepts underpinning in-context learning to develop a new algorithm for few-shot molecular property prediction. Our approach learns to predict molecular properties from a context of (molecule, property measurement) pairs and rapidly adapts to new properties without fine-tuning. On the FS-Mol and BACE molecular property prediction benchmarks, we find this method surpasses the performance of recent meta-learning algorithms at small support sizes and is competitive with the best methods at large support sizes. In-context learning describes an emergent property of large language models (LLMs) that enables them to solve new tasks from only a few demonstrations and without any gradient updates to the model parameters (Brown et al., 2020). This capacity to rapidly adapt to new tasks contrasts sharply with typical few-shot learning algorithms that either use gradient updates, or distance computations to prototypical class centroids, to adapt the pre-trained model to the few-shot learning objective. As a result, in-context learning has become a powerful approach for few-shot learning applications in natural language; however, it is inapplicable to other domains as it uses a language modeling objective to train the model. One such domain is molecular science where few-shot learning is critical to drug discovery. After a biological target has been identified, finding small molecules that inhibit this target may lead to desirable outcomes. For example, inhibiting the protein 15-PGDH with a small molecule inhibitor leads to rejuvenation of aged skeletal muscle tissue in animal studies, effectively reverse-aging the cells (Palla et al., 2021).

demonstration, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2310.08863

Country:

North America > United States > California (0.28)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction

Fifty, Christopher, Paggi, Joseph M., Amid, Ehsan, Leskovec, Jure, Dror, Ron

arXiv.org Artificial IntelligenceOct-6-2023

Few-shot learning is a promising approach to molecular property prediction as supervised data is often very limited. However, many important molecular properties depend on complex molecular characteristics -- such as the various 3D geometries a molecule may adopt or the types of chemical interactions it can form -- that are not explicitly encoded in the feature space and must be approximated from low amounts of data. Learning these characteristics can be difficult, especially for few-shot learning algorithms that are designed for fast adaptation to new tasks. In this work, we develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction. Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations, and a multi-task learning paradigm to structure the embedding space. On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance. Our code is available at https://github.com/cfifty/IGNITE.

artificial intelligence, machine learning, molecule, (16 more...)

arXiv.org Artificial Intelligence

2302.02055

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Measuring and Harnessing Transference in Multi-Task Learning

Fifty, Christopher, Amid, Ehsan, Zhao, Zhe, Yu, Tianhe, Anil, Rohan, Finn, Chelsea

arXiv.org Artificial IntelligenceOct-29-2020

Multi-task learning can leverage information learned by one task to benefit the training of other tasks. Despite this capacity, naïve formulations often degrade performance and in particular, identifying the tasks that would benefit from cotraining remains a challenging design question. In this paper, we analyze the dynamics of information transfer, or transference, across tasks throughout training. Specifically, we develop a similarity measure that can quantify transference among tasks and use this quantity to both better understand the optimization dynamics of multi-task learning as well as improve overall learning performance. In the latter case, we propose two methods to leverage our transference metric. The first operates at a macro-level by selecting which tasks should train together while the second functions at a micro-level by determining how to combine task gradients at each training step. We find these methods can lead to significant improvement over prior work on three supervised multi-task learning benchmarks and one multi-task reinforcement learning paradigm. Deciding if two or more objectives should be trained together in a multi-task model, as well as choosing how that model's parameters should be shared, is an inherently complex issue often left to human experts (Zhang & Yang, 2017). However, a human's understanding of similarity is motivated by their intuition and experience rather than a prescient knowledge of the underlying structures learned by a neural network.

artificial intelligence, neural network, transference, (16 more...)

arXiv.org Artificial Intelligence

2010.15413

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Small Towers Make Big Differences

Wang, Yuyan, Zhao, Zhe, Dai, Bo, Fifty, Christopher, Lin, Dong, Hong, Lichan, Chi, Ed H.

arXiv.org Machine LearningAug-13-2020

Multi-task learning aims at solving multiple machine learning tasks at the same time. A good solution to a multi-task learning problem should be generalizable in addition to being Pareto optimal. In this paper, we provide some insights on understanding the trade-off between Pareto efficiency and generalization as a result of parameterization in multi-task deep learning models. As a multi-objective optimization problem, enough parameterization is needed for handling task conflicts in a constrained solution space; however, from a multi-task generalization perspective, over-parameterization undermines the benefit of learning a shared representation which helps harder tasks or tasks with limited training examples. A delicate balance between multi-task generalization and multi-objective optimization is therefore needed for finding a better trade-off between efficiency and generalization. To this end, we propose a method of under-parameterized self-auxiliaries for multi-task models to achieve the best of both worlds. It is task-agnostic and works with other multi-task learning algorithms. Empirical results show that small towers of under-parameterized self-auxiliaries can make big differences in improving Pareto efficiency in various multi-task applications.

deep learning, neural network, pareto frontier, (16 more...)

arXiv.org Machine Learning

2008.05808

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Simplifying Graph Convolutional Networks

Wu, Felix, Zhang, Tianyi, Souza, Amauri Holanda de Jr., Fifty, Christopher, Yu, Tao, Weinberger, Kilian Q.

arXiv.org Machine LearningFeb-19-2019

Graph Convolutional Networks (GCNs) and their variants have experienced significant attention and have become the de facto methods for learning graph representations. GCNs derive inspiration primarily from recent deep learning approaches, and as a result, may inherit unnecessary complexity and redundant computation. In this paper, we reduce this excess complexity through successively removing nonlinearities and collapsing weight matrices between consecutive layers. We theoretically analyze the resulting linear model and show that it corresponds to a fixed low-pass filter followed by a linear classifier. Notably, our experimental evaluation demonstrates that these simplifications do not negatively impact accuracy in many downstream applications. Moreover, the resulting model scales to larger datasets, is naturally interpretable, and yields up to two orders of magnitude speedup over FastGCN.

deep learning, neural network, survey article, (17 more...)

arXiv.org Machine Learning

1902.07153

Country: North America > United States (0.28)

Genre:

Research Report (0.83)
Overview (0.68)

Industry:

Information Technology (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback