spry
Thinking Forward: Memory-Efficient Federated Finetuning of Language Models
Finetuning large language models (LLMs) in federated learning (FL) settings has become increasingly important as it allows resource-constrained devices to finetune a model using private data. However, finetuning LLMs using backpropagation requires excessive memory (especially from intermediate activations) for resource-constrained devices. While Forward-mode Auto-Differentiation (AD) can significantly reduce memory footprint from activations, we observe that directly applying it to LLM finetuning results in slow convergence and poor accuracy. In this paper, we introduce Spry, an FL algorithm that splits trainable weights of an LLM among participating clients, such that each client computes gradients using forward-mode AD that are closer estimations of the true gradients. Spry achieves a low memory footprint, high accuracy, and fast convergence. We formally prove that the global gradients in Spry are unbiased estimators of true global gradients for homogeneous data distributions across clients, while heterogeneity increases bias of the estimates. We also derive Spry's convergence rate, showing that the gradients decrease inversely proportional to the number of FL rounds, indicating the convergence up to the limits of heterogeneity. Empirically, Spry reduces the memory footprint during training by 1.4-7.1$\times$ in contrast to backpropagation, while reaching comparable accuracy, across a wide range of language tasks, models, and FL settings. Spry reduces the convergence time by 1.2-20.3$\times$
Thinking Forward: Memory-Efficient Federated Finetuning of Language Models
Finetuning large language models (LLMs) in federated learning (FL) settings has become increasingly important as it allows resource-constrained devices to finetune a model using private data. However, finetuning LLMs using backpropagation requires excessive memory (especially from intermediate activations) for resource-constrained devices. While Forward-mode Auto-Differentiation (AD) can significantly reduce memory footprint from activations, we observe that directly applying it to LLM finetuning results in slow convergence and poor accuracy. In this paper, we introduce Spry, an FL algorithm that splits trainable weights of an LLM among participating clients, such that each client computes gradients using forward-mode AD that are closer estimations of the true gradients. Spry achieves a low memory footprint, high accuracy, and fast convergence.
The UNDERWATER drone: $765 gadget can submerge , float like a boat and fly through the air at 40mph
The world's first waterproof drone capable of submerging under water, floating like a boat and flying through the air at over 40mph (60kmh) has been unveiled by US engineers. The $765 (£585) gadget, known as Spry, features a built-in 4K camera that can both record video and snap photos on the fly. Footage is beamed back to a monitor embedded into a waterproof remote control, which the drone's developers claim is another world first for the drone industry. The world's first waterproof drone capable of submerging under water, floating like a boat and flying through the air at over 40mph (60kph) has been unveiled by US engineers SwellProUSA and Florida-based Urban Drones say it has taken two years of designing and prototyping to cross the line'between science fiction and reality by allowing users to fly and swim, something never before possible'. 'The Spry's ability to submerge under water and fly in the air makes it the most versatile drone ever created,' said Alex Rodriguez, Urban Drones' CEO.