Goto

Collaborating Authors

 nblock




MoRe Fine-Tuning with 10x Fewer Parameters

arXiv.org Artificial Intelligence

Parameter-efficient fine-tuning (PEFT) techniques have unlocked the potential to cheaply and easily specialize large pretrained models. However, the most prominent approaches, like low-rank adapters (LoRA), depend on heuristics or rules-of-thumb for their architectural choices -- potentially limiting their performance for new models and architectures. This limitation suggests that techniques from neural architecture search could be used to obtain optimal adapter architectures, but these are often expensive and difficult to implement. We address this challenge with Monarch Rectangular Fine-tuning (MoRe), a simple framework to search over adapter architectures that relies on the Monarch matrix class. Theoretically, we show that MoRe is more expressive than LoRA. Empirically, our approach is more parameter-efficient and performant than state-of-the-art PEFTs on a range of tasks and models, with as few as 5\% of LoRA's parameters.


Best-First Heuristic Search for Multicore Machines

Journal of Artificial Intelligence Research

To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we compare different approaches to parallel best-first search in a shared-memory setting. We present a new method, PBNF, that uses abstraction to partition the state space and to detect duplicate states without requiring frequent locking. PBNF allows speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions in our approach, proving its correctness using temporal logic. Our approach is general, allowing it to extend easily to suboptimal and anytime heuristic search. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using 8-core machines, we show that A*, weighted A* and Anytime weighted A* implemented using PBNF yield faster search than improved versions of previous parallel search proposals.


Edge Partitioning in Parallel Structured Duplicate Detection

AAAI Conferences

We show how edge partitioning, a technique originally developed for external-memory search, can be used to reduce the number of slow synchronization operations needed in parallel graph search. We show that edge partitioning improves on a previous technique called parallel structured duplicate detection by allowing a higher degree of concurrency, even for search problems with little or no inherent locality. For domain-independent graph search, we also show that edge partitioning significantly improves search speed by improving the efficiency of precondition checking. We demonstrate the effectiveness of this approach to parallel graph search for domain-independent STRIPS planning.


Parallel Best-First Search: The Role of Abstraction

AAAI Conferences

To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we present a general approach to best-first heuristic search in a shared-memory setting. Each thread attempts to expand the most promising nodes. By using abstraction to partition the state space, we detect duplicate states while avoiding lock contention. We allow speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using an 8-core machine, we show that A* implemented in our framework yields faster search performance than previous parallel search proposals. We also demonstrate that our approach extends easily to other best-first searches, such as weighted A* and anytime heuristic search.


Suboptimal and Anytime Heuristic Search on Multi-Core Machines

AAAI Conferences

In order to scale with modern processors, planning algorithms must become multi-threaded. In this paper, we present parallel shared-memory algorithms for two problems that underlie many planning systems: suboptimal and anytime heuristic search. We extend a recently-proposed approach for parallel optimal search to the suboptimal case, providing two new pruning rules for bounded suboptimal search. We also show how this new approach can be used for parallel anytime search. Using temporal logic, we prove the correctness of our framework, and in an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using an 8-core machine, we show that it yields faster search performance than previous proposals.


Best-First Heuristic Search for Multi-Core Machines

AAAI Conferences

To harness modern multi-core processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we present a general approach to best-first heuristic search in a shared-memory setting. Each thread attempts to expand the most promising open nodes. By using abstraction to partition the state space, we detect duplicate states without requiring frequent locking. We allow speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions in our approach, verifying its correctness using temporal logic. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using an 8-core machine, we show that A* implemented in our framework yields faster search than improved versions of previous parallel search proposals. Our approach extends easily to other best-first searches, such as Anytime weighted A*.