PDSketch: Integrated Domain Programming, Learning, and Planning
This paper studies a model learning and online planning approach towards building flexible and general robots. Specifically, we investigate how to exploit the locality and sparsity structures in the underlying environmental transition model to improve model generalization, data-efficiency, and runtime-efficiency. We present a new domain definition language, named PDSketch. It allows users to flexibly define high-level structures in the transition models, such as object and feature dependencies, in a way similar to how programmers use TensorFlow or PyTorch to specify kernel sizes and hidden dimensions of a convolutional neural network. The details of the transition model will be filled in by trainable neural networks.
We thank the reviewers for their recognition of our work and helpful comments
We thank the reviewers for their recognition of our work and helpful comments. A: We will modify our paper to cite more related work. Q2/Reviewer#1: "Throw some light on the failure modes". A: We use random uniform initialization on the interval [0, 1). Q6/Reviewer#1: "On the extraction of object features and feature map".
From Stochastic Planning to Marginal MAP
Hao(Jackson) Cui, Radu Marinescu, Roni Khardon
It is well known that the problems of stochastic planning and probabilistic inference are closely related. This paper makes two contributions in this context. The first is to provide an analysis of the recently developed SOGBOFA heuristic planning algorithm that was shown to be effective for problems with large factored state and action spaces. It is shown that SOGBOFA can be seen as a specialized inference algorithm that computes its solutions through a combination of a symbolic variant of belief propagation and gradient ascent. The second contribution is a new solver for Marginal MAP (MMAP) inference. We introduce a new reduction from MMAP to maximum expected utility problems which are suitable for the symbolic computation in SOGBOFA. This yields a novel algebraic gradient-based solver (AGS) for MMAP. An experimental evaluation illustrates the potential of AGS in solving difficult MMAP problems.
Real-World Image Variation by Aligning Diffusion Inversion Chain
Recent diffusion model advancements have enabled high-fidelity images to be generated using text prompts. However, a domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images. Our investigation uncovers that this domain gap originates from a latents' distribution gap in different diffusion processes. To address this issue, we propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL) that utilizes diffusion models to generate image variations from a single image exemplar. Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain. Specifically, we demonstrate that step-wise latent distribution alignment is essential for generating high-quality variations.
Scalable Interpretability via Polynomials
Generalized Additive Models (GAMs) have quickly become the leading choice for interpretable machine learning. However, unlike uninterpretable methods such as DNNs, they lack expressive power and easy scalability, and are hence not a feasible alternative for real-world tasks. We present a new class of GAMs that use tensor rank decompositions of polynomials to learn powerful, inherently-interpretable models. Our approach, titled Scalable Polynomial Additive Models (SPAM) is effortlessly scalable and models all higher-order feature interactions without a combinatorial parameter explosion. SPAM outperforms all current interpretable approaches, and matches DNN/XGBoost performance on a series of real-world benchmarks with up to hundreds of thousands of features. We demonstrate by human subject evaluations that SPAMs are demonstrably more interpretable in practice, and are hence an effortless replacement for DNNs for creating interpretable and high-performance systems suitable for large-scale machine learning.
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections Roy Miles
Large language models (LLMs) have recently emerged as powerful tools for tackling many language-processing tasks. Despite their success, training and finetuning these models is still far too computationally and memory intensive. In this paper, we identify and characterise the important components needed for effective model convergence using gradient descent. In doing so we find that the intermediate activations used to implement backpropagation can be excessively compressed without incurring any degradation in performance. This result leads us to a cheap and memory-efficient algorithm for both fine-tuning and pre-training LLMs. The proposed algorithm simply divides the tokens up into smaller sub-tokens before projecting them onto a fixed 1-dimensional subspace during the forward pass. These features are then coarsely reconstructed during the backward pass to implement the update rules. We confirm the effectiveness of our algorithm as being complimentary to many state-of-the-art PEFT methods on the VTAB-1k fine-tuning benchmark. Furthermore, we outperform QLoRA for fine-tuning LLaMA and show competitive performance against other memory-efficient pre-training methods on the large-scale C4 dataset.
Learning Transferable Features for Implicit Neural Representations
Implicit neural representations (INRs) have demonstrated success in a variety of applications, including inverse problems and neural rendering. An INR is typically trained to capture one signal of interest, resulting in learned neural features that are highly attuned to that signal. Assumed to be less generalizable, we explore the aspect of transferability of such learned neural features for fitting similar signals.
Appendices A Additional Information on Prompts 16 A.1 Words and Word Frequencies 16 A.2 The Distribution of Prompt Types in the Benchmark 17 A.3 The Encoding Scheme for Task 2 Answers
To answer the first question, we split the data into the two groups: the first group contains the subset of data for numeric-simple prompts, and the second group the subset of data for attribute-color prompts. We only consider prompts in both groups that contain the same numbers (1-4) and the same words ("cat", "apple", "koala", "bottle", "mushroom"), to isolate the effect of adding the color term as opposed to potential confounding factors. For example a confounding factor might be the word identity, as a model might be more accurate in generating correct images when the prompt contains the word "dog", and if this word exists only in the first prompt type and not in the second then responses in the first prompt type will on average have higher accuracy that may or may not
Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains
Elliot Meyerson, Risto Miikkulainen
As deep learning applications continue to become more diverse, an interesting question arises: Can general problem solving arise from jointly learning several such diverse tasks? To approach this question, deep multi-task learning is extended in this paper to the setting where there is no obvious overlap between task architectures. The idea is that any set of (architecture,task) pairs can be decomposed into a set of potentially related subproblems, whose sharing is optimized by an efficient stochastic algorithm. The approach is first validated in a classic synthetic multi-task learning benchmark, and then applied to sharing across disparate architectures for vision, NLP, and genomics tasks. It discovers regularities across these domains, encodes them into sharable modules, and combines these modules systematically to improve performance in the individual tasks. The results confirm that sharing learned functionality across diverse domains and architectures is indeed beneficial, thus establishing a key ingredient for general problem solving in the future.