AITopics

2403.13124

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

arXiv.org Machine LearningFeb-22-2024

BlackJAX: Composable Bayesian inference in JAX

Cabezas, Alberto, Corenflos, Adrien, Lao, Junpeng, Louf, Rémi, Carnec, Antoine, Chaudhari, Kaustubh, Cohn-Gordon, Reuben, Coullon, Jeremie, Deng, Wei, Duffield, Sam, Durán-Martín, Gerardo, Elantkowski, Marcin, Foreman-Mackey, Dan, Gregori, Michele, Iguaran, Carlos, Kumar, Ravin, Lysy, Martin, Murphy, Kevin, Orduz, Juan Camilo, Patel, Karm, Wang, Xi, Zinkov, Rob

BlackJAX is a library implementing sampling and variational inference algorithms commonly used in Bayesian computation. It is designed for ease of use, speed, and modularity by taking a functional approach to the algorithms' implementation. BlackJAX is written in Python, using JAX to compile and run NumpPy-like samplers and variational methods on CPUs, GPUs, and TPUs. The library integrates well with probabilistic programming languages by working directly with the (un-normalized) target log density function. BlackJAX is intended as a collection of low-level, composable implementations of basic statistical 'atoms' that can be combined to perform well-defined Bayesian inference, but also provides high-level routines for ease of use. It is designed for users who need cutting-edge methods, researchers who want to create complex sampling methods, and people who want to learn how these work.

artificial intelligence, bayesian inference, machine learning, (13 more...)

2402.10797

Country: Europe (0.46)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

arXiv.org Machine LearningNov-28-2023

Beyond Invariance: Test-Time Label-Shift Adaptation for Distributions with "Spurious" Correlations

Sun, Qingyao, Murphy, Kevin, Ebrahimi, Sayna, D'Amour, Alexander

Changes in the data distribution at test time can have deleterious effects on the performance of predictive models $p(y|x)$. We consider situations where there are additional meta-data labels (such as group labels), denoted by $z$, that can account for such changes in the distribution. In particular, we assume that the prior distribution $p(y, z)$, which models the dependence between the class label $y$ and the "nuisance" factors $z$, may change across domains, either due to a change in the correlation between these terms, or a change in one of their marginals. However, we assume that the generative model for features $p(x|y,z)$ is invariant across domains. We note that this corresponds to an expanded version of the widely used "label shift" assumption, where the labels now also include the nuisance factors $z$. Based on this observation, we propose a test-time label shift correction that adapts to changes in the joint distribution $p(y, z)$ using EM applied to unlabeled samples from the target domain distribution, $p_t(x)$. Importantly, we are able to avoid fitting a generative model $p(x|y, z)$, and merely need to reweight the outputs of a discriminative model $p_s(y, z|x)$ trained on the source distribution. We evaluate our method, which we call "Test-Time Label-Shift Adaptation" (TTLSA), on several standard image and text datasets, as well as the CheXpert chest X-ray dataset, and show that it improves performance over methods that target invariance to changes in the distribution, as well as baseline empirical risk minimization methods. Code for reproducing experiments is available at https://github.com/nalzok/test-time-label-shift .

artificial intelligence, bayesian inference, machine learning, (20 more...)

2211.15646

Country: North America > United States > Louisiana (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Health & Medicine > Nuclear Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Artificial IntelligenceOct-28-2023

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

Yu, Lijun, Cheng, Yong, Wang, Zhiruo, Kumar, Vivek, Macherey, Wolfgang, Huang, Yanping, Ross, David A., Essa, Irfan, Bisk, Yonatan, Yang, Ming-Hsuan, Murphy, Kevin, Hauptmann, Alexander G., Jiang, Lu

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos. SPAE converts between raw pixels and interpretable lexical tokens (or words) extracted from the LLM's vocabulary. The resulting tokens capture both the semantic meaning and the fine-grained details needed for visual reconstruction, effectively translating the visual content into a language comprehensible to the LLM, and empowering it to perform a wide array of multimodal tasks. Our approach is validated through in-context learning experiments with frozen PaLM 2 and GPT 3.5 on a diverse set of image understanding and generation tasks. Our method marks the first successful attempt to enable a frozen LLM to generate image content while surpassing state-of-the-art performance in image understanding tasks, under the same setting, by over 25%.

large language model, machine learning, spae, (17 more...)

2306.17842

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Artificial IntelligenceJun-27-2023

Low-rank extended Kalman filtering for online learning of neural networks from streaming data

Chang, Peter G., Durán-Martín, Gerardo, Shestopaloff, Alexander Y, Jones, Matt, Murphy, Kevin

We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream. The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior precision matrix, which gives a cost per step which is linear in the number of model parameters. In contrast to methods based on stochastic variational inference, our method is fully deterministic, and does not require step-size tuning. We show experimentally that this results in much faster (more sample efficient) learning, which results in more rapid adaptation to changing distributions, and faster accumulation of reward when used as part of a contextual bandit algorithm.

approximation, artificial intelligence, machine learning, (18 more...)

2305.19535

Country: North America > United States > Colorado (0.14)

Genre: Instructional Material (0.94)

Industry: Education > Educational Setting > Online (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceJan-2-2023

Muse: Text-To-Image Generation via Masked Generative Transformers

Chang, Huiwen, Zhang, Han, Barber, Jarred, Maschinot, AJ, Lezama, Jose, Jiang, Lu, Yang, Ming-Hsuan, Murphy, Kevin, Freeman, William T., Rubinstein, Michael, Li, Yuanzhen, Krishnan, Dilip

We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at http://muse-model.github.io.

arxiv preprint arxiv, machine learning, natural language, (16 more...)

2301.00704

Genre: Research Report (0.50)

Industry:

Media (0.49)
Leisure & Entertainment > Sports (0.46)
Transportation (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

arXiv.org Artificial IntelligenceOct-19-2022

Uncertainty Disentanglement with Non-stationary Heteroscedastic Gaussian Processes for Active Learning

Patel, Zeel B, Batra, Nipun, Murphy, Kevin

Gaussian processes are Bayesian non-parametric models used in many areas. In this work, we propose a Non-stationary Heteroscedastic Gaussian process model which can be learned with gradient-based techniques. We demonstrate the interpretability of the proposed model by separating the overall uncertainty into aleatoric (irreducible) and epistemic (model) uncertainty. We illustrate the usability of derived epistemic uncertainty on active learning problems. We demonstrate the efficacy of our model with various ablations on multiple datasets.

artificial intelligence, gaussian process, machine learning, (15 more...)

2210.10964

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

arXiv.org Artificial IntelligenceJul-28-2022

Language Model Cascades

Dohan, David, Xu, Winnie, Lewkowycz, Aitor, Austin, Jacob, Bieber, David, Lopes, Raphael Gontijo, Wu, Yuhuai, Michalewski, Henryk, Saurous, Rif A., Sohl-dickstein, Jascha, Murphy, Kevin, Sutton, Charles

Prompted models have demonstrated impressive In this position paper, we argue that a useful unifying few-shot learning abilities. Repeated interactions framework for understanding and extending this disparate at test-time with a single model, or the body of work is in terms of probabilistic programming languages composition of multiple models together, further (PPL) extended to work with strings, instead of expands capabilities. These compositions are more atomic data types like integers and floats. That is, probabilistic models, and may be expressed in we use a PPL to define a joint probability model on stringvalued the language of graphical models with random random variables, parameterized using LMs, and variables whose values are complex data types then condition this model on string-valued observations in such as strings. Cases with control flow and dynamic order to compute a posterior over string-valued unknowns, structure require techniques from probabilistic which we can then infer. We call such a probabilistic programming, which allow implementing program a language model cascade. We show that this disparate model structures and inference strategies framework captures many recent approaches, and also allows in a unified language. We formalize several us to tackle more complex multi-step reasoning problems.

large language model, logic & formal reasoning, machine learning, (17 more...)

2207.10342

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.46)

arXiv.org Machine LearningJul-10-2020

Population-Based Black-Box Optimization for Biological Sequence Design

Angermueller, Christof, Belanger, David, Gane, Andreea, Mariet, Zelda, Dohan, David, Murphy, Kevin, Colwell, Lucy, Sculley, D

The use of black-box optimization for the design of new biological sequences is an emerging research area with potentially revolutionary impact. The cost and latency of wet-lab experiments requires methods that find good sequences in few experimental rounds of large batches of sequences--a setting that off-the-shelf black-box optimization methods are ill-equipped to handle. We find that the performance of existing methods varies drastically across optimization tasks, posing a significant obstacle to real-world applications. To improve robustness, we propose Population-Based Black-Box Optimization (P3BO), which generates batches of sequences by sampling from an ensemble of methods. The number of sequences sampled from any method is proportional to the quality of sequences it previously proposed, allowing P3BO to combine the strengths of individual methods while hedging against their innate brittleness. Adapting the hyper-parameters of each of the methods online using evolutionary optimization further improves performance. Through extensive experiments on in-silico optimization tasks, we show that P3BO outperforms any single method in its population, proposing higher quality sequences as well as more diverse batches. As such, P3BO and Adaptive-P3BO are a crucial step towards deploying ML to real-world sequence design.

air transportation, deep learning, sequence, (18 more...)

doi: 10.1111/j.1365-246X.2006.03227.x

2006.03227

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Oregon (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry:

Transportation > Air (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningMay-7-2020

Machine Learning on Graphs: A Model and Comprehensive Taxonomy

Chami, Ines, Abu-El-Haija, Sami, Perozzi, Bryan, Ré, Christopher, Murphy, Kevin

There has been a surge of recent interest in learning representations for graph-structured data. Graph representation learning methods have generally fallen into three main categories, based on the availability of labeled data. The first, network embedding (such as shallow graph embedding or graph auto-encoders), focuses on learning unsupervised representations of relational structure. The second, graph regularized neural networks, leverages graphs to augment neural network losses with a regularization objective for semi-supervised learning. The third, graph neural networks, aims to learn differentiable functions over discrete topologies with arbitrary structure. However, despite the popularity of these areas there has been surprisingly little work on unifying the three paradigms. Here, we aim to bridge the gap between graph neural networks, network embedding and graph regularization models. We propose a comprehensive taxonomy of representation learning methods for graph-structured data, aiming to unify several disparate bodies of work. Specifically, we propose a Graph Encoder Decoder Model (GRAPHEDM), which generalizes popular algorithms for semi-supervised learning on graphs (e.g. GraphSage, Graph Convolutional Networks, Graph Attention Networks), and unsupervised learning of graph representations (e.g. DeepWalk, node2vec, etc) into a single consistent approach. To illustrate the generality of this approach, we fit over thirty existing methods into this framework. We believe that this unifying view both provides a solid foundation for understanding the intuition behind these methods, and enables future research in the area.

deep learning, graph, neural network, (20 more...)

2005.03675

Country: North America > United States > California > Santa Clara County > Palo Alto (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology > Services (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)